📊 Netflix Data Analysis Project

This project explores and analyzes Netflix's dataset using Power BI and Python. It includes a complete data cleaning workflow, feature extraction, and sentiment analysis, followed by insightful visualizations.
🧹 Data Cleaning and Preprocessing Steps
-
Column Profiling Inspected column data types and value distributions. Identified key fields like type, title, director, cast, country, release_year, rating.
-
Dealing with Missing Values Detected nulls in fields such as director, cast, country. Blank entries were standardized using Pandas.
-
Encoding Nulls Replaced empty strings and whitespace with NaN for consistency.
-
Imputing Missing Values Filled missing country and director with 'Unknown'. Rows with critical missing data were dropped or flagged.
-
Working with Dates Converted date_added to datetime type. Extracted year_added, month_added for time-based analysis.
-
Adding New Columns Derived columns: content_age = release_year - year_added. Added primary_country, primary_genre from comma-separated lists.
-
Splitting / Extracting Data Split multivalue fields like cast, genres, and country.
-
Extracting First Item Used .str.split(',').str[0] to extract the first country/genre/actor for analysis.
-
Text / Sentiment Analysis Cleaned description text using NLP. Performed sentiment analysis using TextBlob/VADER.
-
Filtering Unnecessary Data Removed redundant columns and low-quality rows. Focused on movies and TV shows with meaningful metadata.
📊 Visual Analysis with Power BI ✅ Count of Shows by Type Movies: 1.43K (55.42%) TV Shows: 1.15K (44.58%)
✅ Sum of release_year by Country Visualized content production over time by geography. Countries like USA, India, UK, Japan dominate content count.
✅ Rating Distribution by Type Explored how ratings vary across Movies and TV Shows.
🔧 Tools Used Power BI for data visualization Jupyter Notebook / Colab for data analysis
📌 Dataset Source Netflix Movies and TV Shows – Kaggle