This repository contains my solutions for the Fetch Rewards Data Modeling & Analysis Challenge, which involved structuring unstructured JSON data, writing SQL queries, identifying data quality issues, and communicating insights to stakeholders.
π Project Overview
This project required transforming raw, unstructured JSON data into a structured relational data model, writing business-focused SQL queries, and evaluating data quality issues. The key objectives included:
βοΈ Data Modeling: Designing a structured relational database schema based on provided sample data.
βοΈ SQL Queries: Answering business questions using optimized queries.
βοΈ Data Quality Assessment: Identifying potential inconsistencies, missing values, and anomalies.
βοΈ Stakeholder Communication: Writing a clear and concise business report.
π Project Deliverables:
- Relational Data Model
- Reviewed unstructured JSON data and created a structured relational ER Diagram.
- Defined tables, fields, primary keys, and foreign keys for optimal data organization.
- ER Diagram Link: ...
- Business-Focused SQL Queries:
SQL queries were written to answer the following key business questions:
- Query 1: Top 5 brands by receipts scanned for the most recent month. ...
- Query 2: Average spend comparison for 'Accepted' vs. 'Rejected' receipts. ...
- Query 3: Total items purchased comparison for 'Accepted' vs. 'Rejected' receipts. ...
- Query 4: Brand with the highest spend among users created within the past 6 months. ...
- Query 5: Brand with the most transactions among users created within the past 6 months. ...
- Data Quality Assessment
Using Python and SQL, I analyzed the dataset for potential data quality issues, including:
- Missing or inconsistent data in key fields.
- Duplicate records and anomalies.
- Inconsistent date formats and incorrect timestamps.
- Irregularities in rewardsReceiptStatus values.
π Technologies & Tools Used:
- SQL: Querying and analyzing structured data.
- Python (Pandas, NumPy): Data exploration, anomaly detection, and validation.
- Database Modeling: Designing relational schema for structured storage.
- Diagramming Tools (Visio): ER diagram creation for data structure visualization.
- Google BigQuery: Database creation and querying
π Key Learning Outcomes:
π Data Modeling & Normalization:
- Learned how to analyze raw JSON data and design a relational schema for a data warehouse.
- Understood the importance of primary and foreign keys in maintaining data integrity.
π SQL Query Optimization: - Gained experience in writing efficient SQL queries to extract business insights.
- Applied ranking functions, aggregation, and filtering to answer real-world business questions.
π Data Quality Analysis: - Discovered common data quality issues, such as duplicate entries, missing values, and format inconsistencies.
- Used Python (Pandas, NumPy) and SQL to identify and address anomalies in the dataset.
π Effective Communication: - Developed clear and concise business reports and email communication for non-technical stakeholders.
- Practised translating technical findings into actionable business recommendations.
Siddharth Gada
π§ Email: gadasiddharth@gmail.com
π LinkedIn: https://www.linkedin.com/in/siddharthgada/