Skip to content

A full exploratory data analysis (EDA) of Netflix’s global content catalog. The project includes cleaning and transforming the dataset, and building an interactive Power BI dashboard to uncover key patterns such as genre distribution, release trends, and content ratings. Tech Stack: Power BI, SQL, Python, Pandas

Notifications You must be signed in to change notification settings

daemon966/Netflix_EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📊 Netflix Data Analysis Project Screenshot 2025-05-26 202424

This project explores and analyzes Netflix's dataset using Power BI and Python. It includes a complete data cleaning workflow, feature extraction, and sentiment analysis, followed by insightful visualizations.

🧹 Data Cleaning and Preprocessing Steps

  1. Column Profiling Inspected column data types and value distributions. Identified key fields like type, title, director, cast, country, release_year, rating.

  2. Dealing with Missing Values Detected nulls in fields such as director, cast, country. Blank entries were standardized using Pandas.

  3. Encoding Nulls Replaced empty strings and whitespace with NaN for consistency.

  4. Imputing Missing Values Filled missing country and director with 'Unknown'. Rows with critical missing data were dropped or flagged.

  5. Working with Dates Converted date_added to datetime type. Extracted year_added, month_added for time-based analysis.

  6. Adding New Columns Derived columns: content_age = release_year - year_added. Added primary_country, primary_genre from comma-separated lists.

  7. Splitting / Extracting Data Split multivalue fields like cast, genres, and country.

  8. Extracting First Item Used .str.split(',').str[0] to extract the first country/genre/actor for analysis.

  9. Text / Sentiment Analysis Cleaned description text using NLP. Performed sentiment analysis using TextBlob/VADER.

  10. Filtering Unnecessary Data Removed redundant columns and low-quality rows. Focused on movies and TV shows with meaningful metadata.

📊 Visual Analysis with Power BI ✅ Count of Shows by Type Movies: 1.43K (55.42%) TV Shows: 1.15K (44.58%)

✅ Sum of release_year by Country Visualized content production over time by geography. Countries like USA, India, UK, Japan dominate content count.

✅ Rating Distribution by Type Explored how ratings vary across Movies and TV Shows.

🔧 Tools Used Power BI for data visualization Jupyter Notebook / Colab for data analysis

📌 Dataset Source Netflix Movies and TV Shows – Kaggle

About

A full exploratory data analysis (EDA) of Netflix’s global content catalog. The project includes cleaning and transforming the dataset, and building an interactive Power BI dashboard to uncover key patterns such as genre distribution, release trends, and content ratings. Tech Stack: Power BI, SQL, Python, Pandas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published