A complete, real-world AdWords Data Analysis pipeline from raw Excel to fully interactive dashboards using Excel, Python (pandas), MySQL, and Power BI.
This project mimics an enterprise-level ETL (Extract, Transform, Load) and BI (Business Intelligence) solution, showcasing data cleaning, relational modeling, and dashboarding β ideal for both recruiters and students studying data analytics, BI, or SQL-based solutions.
- π― Project Objective
- π Business Objective
- π Tools & Technologies
- π End-to-End Workflow
- π§Ή Excel + Python Processing
- π Lookup & Fact Tables Description
- π SQL Schema & Relationships
- π§ Relationship Diagram (EER)
- π Power BI Dashboard
- π§© Power BI Data Model View
- β How to Use This Project
- π Repository Structure
- π License
- π¬ About Me
This project analyzes Google AdWords traffic data with the goal of uncovering performance trends, keyword effectiveness, and cost dynamics. It simulates a real-life advertising analytics pipeline, commonly used in marketing and digital performance teams.
To help marketing teams and decision-makers:
- Monitor campaign performance
- Identify high-performing and low-performing keywords
- Track search demand and cost trends over time
- Visualize performance KPIs via intuitive dashboards
| Tool | Purpose |
|---|---|
| Excel | Initial data entry, manual formulas, CSV exports |
| Python | DataFrame manipulation, ID generation, CSV output |
| MySQL | Relational schema modeling & querying |
| Power BI | Data modeling, DAX metrics, visual analytics |
graph LR
A[Raw Excel] --> B[Excel Formulas]
B --> C[Python Script]
C --> D[MySQL Database]
D --> E[Power BI Dashboard]
Each stage builds on the last. The result is a smooth, production-style pipeline from messy input to insights.
- Contains columns like
title,keyword,positions,traffic,CPC, etc. - This is the simulated export from Google AdWords.
Using pandas, we:
- Loaded Excel using
pd.read_excel(). - Created unique
keyword_IDvalues by mapping each keyword to a number. - Example (conceptual):
df['keyword_ID'] = df['keyword'].astype('category').cat.codes
Created three new CSVs:
keyword.csv: Unique list of keywords + IDs.search_volume.csv: Total volume usingSUMIF.keyword_difficulty.csv: Average difficulty usingAVERAGEIF.
These were calculated using Excel formulas:
| Formula | Purpose |
|---|---|
SUMIF() |
Aggregate total search volume |
AVERAGEIF() |
Compute average difficulty |
IF(B2>=50,"Hard","Moderate") |
Assign difficulty label |
VLOOKUP() |
Lookup keyword metadata |
β These tables act as lookup/reference tables for SQL.
This table contains a deduplicated list of all unique keywords with their assigned keyword_ID.
| Column Name | Description |
|---|---|
keyword_ID |
Unique identifier for each keyword (used as primary key) |
keyword |
Actual keyword text, fetched from raw data using VLOOKUP |
β
Created using Excel's VLOOKUP function to map keywords with their IDs.
This table includes the total monthly search volume per keyword.
| Column Name | Description |
|---|---|
keyword_ID |
Foreign key that links to the keyword.csv table |
search_volume |
Monthly search volume, aggregated using Excelβs SUMIF function |
β
Created using SUMIF to calculate the total search volume for each keyword ID.
This table stores the average keyword difficulty score along with a difficulty level label.
| Column Name | Description |
|---|---|
keyword_ID |
Foreign key linking to the keyword.csv table |
avg_difficulty |
Average difficulty score, calculated using AVERAGEIF |
difficulty_level |
Text label derived using Excel's IF function (e.g., "Hard" if β₯ 50) |
β Created using:
AVERAGEIFto compute average difficulty perkeyword_IDIFto categorize as"Hard"or"Moderate"
This table holds all enriched AdWords metrics after processing.
| Column Name | Description |
|---|---|
title |
Campaign or landing page title |
keyword |
Keyword string (linked via keyword_ID) |
keyword_ID |
Foreign key from keyword.csv, created using Python |
positions |
Current average ad position |
previous_positions |
Previous average position |
last_seen |
Last recorded impression/click date |
search_volume |
Total monthly searches |
CPC |
Cost-per-click (USD) |
Traffic |
Estimated traffic from the keyword |
Traffic_Percent |
Percentage of total traffic |
Traffic_Cost |
Cost attributed to keyword traffic |
Traffic_Cost_Percent |
Percentage of total traffic cost |
Competition |
Keyword competition score (0β1 scale) |
Number_of_Results |
Total search engine results for the keyword |
Keyword_difficulty |
Foreign key from keyword_difficulty.csv |
β This is the central fact table, joined with all three lookup tables to support relational queries and visualizations.
Created a MySQL database:
CREATE DATABASE IF NOT EXISTS Traffic_project;
USE Traffic_project;This is the fact table containing AdWords metrics.
CREATE TABLE website_traffic_data (
title VARCHAR(250) NOT NULL,
keyword VARCHAR(250) NOT NULL,
keyword_ID INT NOT NULL,
positions INT NOT NULL,
previous_positions INT NOT NULL,
last_seen DATE NOT NULL,
Search_Volume INT NOT NULL,
CPC decimal(10,2) NOT NULL,
Traffic INT NOT NULL,
Traffic_Percent decimal(10,2) NOT NULL,
Traffic_Cost int NOT NULL,
Traffic_Cost_Percent decimal(10,2) NOT NULL,
Competition decimal(10,2) NOT NULL,
Number_of_Results INT NOT NULL,
Keyword_difficulty INT NOT NULL
);Imported the other CSVs into MySQL:
keywordsearch_volumekeyword_difficulty
Then established relational integrity:
ALTER TABLE website_traffic_data
ADD FOREIGN KEY (keyword_ID) REFERENCES keyword(keyword_ID);These keys ensure consistent data joins between tables.
These diagrams visualize the 1-to-many relationships between:
- Keywords β Traffic Data
- Keyword Difficulty β Traffic Data
- Search Volume β Traffic Data
Connected Power BI to MySQL database and created an interactive dashboard.
- Cards: Total Traffic, Search Volume, Cost, Results
- Line Chart: Traffic over time
- Bar Chart: Traffic by Keyword
- Treemap: Search Volume by Keyword
- Pie/Donut: Traffic by Difficulty, by Month
- Slicers: Year, Quarter, Keyword filter
Average CPC = AVERAGE(website_traffic_data[CPC])
Last Date = MAX('website_traffic_data'[last_seen])
Start Date = MIN('website_traffic_data'[last_seen])
Total Results = SUM(website_traffic_data[Number_of_Results])
Total Search Volume = SUM(website_traffic_data[Search_Volume])
Total Traffic = SUM('website_traffic_data'[Traffic])
Total Traffic Cost = SUM(website_traffic_data[Traffic_Cost])
Traffic Percent = AVERAGE(website_traffic_data[Traffic_Percent])
Calender = CALENDAR([Start Date],[Last Date])
Months = FORMAT(Calender[Date],"MMMM")
Quarter = QUARTER(Calender[Date])
Year = YEAR(Calender[Date])
Year and QTR = 'Calender'[Year] & " QTR " & 'Calender'[Quarter]
These enable filtering, aggregation, and time-based visualizations.
To enable seamless slicing and aggregation, a clean star schema was created in Power BI.
- Fact:
website_traffic_data - Dimensions:
keyword(via keyword_ID)keyword_difficultysearch_volume
β This model ensures accurate filtering and joins.
git clone https://github.com/AdityakumarDA/Adword-Data-Analysis.git- View and understand
Raw_data.xlsx, CSVs - Optionally edit and export again using Excel formulas
- Import all
.csvusing MySQL Workbench - Use provided SQL schema to create and relate tables
- Use
Traffic Project dashboard.pbixto view interactive report - Or connect manually via:
Home β Get Data β MySQL
π¦ Adword-Data-Analysis
β£ π Raw_data.xlsx
β£ π website_traffic_data.csv
β£ π keyword.csv
β£ π keyword_difficulty.csv
β£ π search_volume.csv
β£ π Traffic Data SQL script.sql
β£ π Traffic Project dashboard.pbix
β£ π images
β β£ π· raw_excel_sample.png
β β£ π· excel_lookup_table_1.png
β β£ π· excel_lookup_tables_2png
β β£ π· excel_lookup_table_3.png
β β£ π· main_table.png
β β£ π· mysql_schema_editor.png
β β£ π· EER_Diagram.png
β β£ π· power_bi_dashboard.png
β β π· powerbi_data_model.png
β£ π LICENSE
β π README.md
This project is licensed under the MIT License β you are free to use, modify, and share with attribution.
I'm Aditya Rajput, a data analyst passionate about storytelling with data, unsupervised learning, and real-world analytics.
If you liked this project, please β the repo!
π If you're a recruiter: This demonstrates strong knowledge in data pipelines, transformation, relational modeling, and BI reporting.
π If you're a student: Feel free to use this structure to learn data integration and dashboarding step by step.








