Skip to content

Database creation and data cleaning project for Fetch using Google BigQuery

Notifications You must be signed in to change notification settings

siddharthgada/Fetch-Rewards-Exercise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fetch Rewards Data Modeling & Analysis

This repository contains my solutions for the Fetch Rewards Data Modeling & Analysis Challenge, which involved structuring unstructured JSON data, writing SQL queries, identifying data quality issues, and communicating insights to stakeholders.

πŸš€ Project Overview
This project required transforming raw, unstructured JSON data into a structured relational data model, writing business-focused SQL queries, and evaluating data quality issues. The key objectives included:

βœ”οΈ Data Modeling: Designing a structured relational database schema based on provided sample data.
βœ”οΈ SQL Queries: Answering business questions using optimized queries.
βœ”οΈ Data Quality Assessment: Identifying potential inconsistencies, missing values, and anomalies.
βœ”οΈ Stakeholder Communication: Writing a clear and concise business report.

πŸ“‚ Project Deliverables:

  1. Relational Data Model
  • Reviewed unstructured JSON data and created a structured relational ER Diagram.
  • Defined tables, fields, primary keys, and foreign keys for optimal data organization.
  • ER Diagram Link: ...
  1. Business-Focused SQL Queries:
    SQL queries were written to answer the following key business questions:
  • Query 1: Top 5 brands by receipts scanned for the most recent month. ...
  • Query 2: Average spend comparison for 'Accepted' vs. 'Rejected' receipts. ...
  • Query 3: Total items purchased comparison for 'Accepted' vs. 'Rejected' receipts. ...
  • Query 4: Brand with the highest spend among users created within the past 6 months. ...
  • Query 5: Brand with the most transactions among users created within the past 6 months. ...
  1. Data Quality Assessment
    Using Python and SQL, I analyzed the dataset for potential data quality issues, including:
  • Missing or inconsistent data in key fields.
  • Duplicate records and anomalies.
  • Inconsistent date formats and incorrect timestamps.
  • Irregularities in rewardsReceiptStatus values.

πŸ“Š Technologies & Tools Used:

  1. SQL: Querying and analyzing structured data.
  2. Python (Pandas, NumPy): Data exploration, anomaly detection, and validation.
  3. Database Modeling: Designing relational schema for structured storage.
  4. Diagramming Tools (Visio): ER diagram creation for data structure visualization.
  5. Google BigQuery: Database creation and querying

πŸ” Key Learning Outcomes:
πŸ“Œ Data Modeling & Normalization:

  • Learned how to analyze raw JSON data and design a relational schema for a data warehouse.
  • Understood the importance of primary and foreign keys in maintaining data integrity.
    πŸ“Œ SQL Query Optimization:
  • Gained experience in writing efficient SQL queries to extract business insights.
  • Applied ranking functions, aggregation, and filtering to answer real-world business questions.
    πŸ“Œ Data Quality Analysis:
  • Discovered common data quality issues, such as duplicate entries, missing values, and format inconsistencies.
  • Used Python (Pandas, NumPy) and SQL to identify and address anomalies in the dataset.
    πŸ“Œ Effective Communication:
  • Developed clear and concise business reports and email communication for non-technical stakeholders.
  • Practised translating technical findings into actionable business recommendations.

πŸ‘€ Author

Siddharth Gada
πŸ“§ Email: gadasiddharth@gmail.com
πŸ”— LinkedIn: https://www.linkedin.com/in/siddharthgada/

About

Database creation and data cleaning project for Fetch using Google BigQuery

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published