Skip to content

This project cleans and analyzes US household income data using SQL. It resolves duplicates, standardizes entries, and addresses inconsistencies. Exploratory analysis highlights trends in land, water, and income distributions, uncovering regional disparities and offering insights into household income patterns across states and cities.

Notifications You must be signed in to change notification settings

MatanNafshi/US_Household_Income_SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🏑 US Household Income Analysis

This project focuses on cleaning, exploring, and analyzing US household income data using SQL. It aims to resolve data inconsistencies, uncover regional income trends, and provide insights into household income distributions across states and cities.


πŸ“Š Project Objectives

  1. Data Cleaning:

    • Resolve duplicate records.
    • Standardize inconsistent entries (e.g., state names, place types).
    • Address missing or incorrect data fields.
  2. Exploratory Data Analysis (EDA):

    • Analyze land and water area distributions by state.
    • Investigate household income patterns (mean and median) across regions.
    • Highlight disparities in income based on location and demographic details.
  3. Data Integration:

    • Combine datasets to enrich analysis and derive deeper insights.

πŸ“ Datasets Used

  1. USHouseholdIncome.csv:

    • Contains geographical and demographic household income data.
    • Includes details like state, county, city, area type, and land/water area.
  2. USHouseholdIncome_Statistics.csv:

    • Provides statistical metrics like mean and median household incomes.

πŸ› οΈ Key Features

Data Cleaning

  • Duplicate Removal: Ensures unique records for accurate analysis.
  • Inconsistency Fixes: Standardizes state names, place types, and other fields.
  • Missing Data Handling: Updates empty fields with relevant information.

Data Analysis

  • Land & Water Area: Identifies states with the largest areas.
  • Income Trends: Explores average income distributions across states and cities.
  • Regional Disparities: Highlights significant income gaps.

πŸ—‚οΈ File Structure

  • datasets/:
    • USHouseholdIncome.csv
    • USHouseholdIncome_Statistics.csv
  • scripts/:
    • data_cleaning.sql: SQL scripts for data cleaning and preparation.
    • eda_analysis.sql: SQL scripts for exploratory data analysis.
  • README.md: Project overview and instructions.

πŸš€ How to Run

  1. Import Datasets: Load the CSV files into your SQL database.
  2. Execute Scripts:
    • Run data_cleaning.sql for cleaning and preparation.
    • Run eda_analysis.sql for exploratory analysis.
  3. Visualize Results: Use SQL query outputs to generate insights or visualizations.

πŸ“ Sample Insights

  • States with the largest land and water areas include Texas and Alaska.
  • Regions like Puerto Rico exhibit significantly lower median incomes.
  • Urban areas show higher mean incomes compared to rural and suburban regions.

πŸ“¬ Contact

For questions or suggestions, feel free to reach out!

Linkedin: https://www.LinkedIn.com/in/matan-nafshi

Email: matannaf@gmail.com

Phone: +972 528777096

About

This project cleans and analyzes US household income data using SQL. It resolves duplicates, standardizes entries, and addresses inconsistencies. Exploratory analysis highlights trends in land, water, and income distributions, uncovering regional disparities and offering insights into household income patterns across states and cities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published