This repository contains Python code for building a movie recommendation system using collaborative filtering techniques. Below is a breakdown of the files and functionalities included:
-
Clone this repository:
git clone https://github.com/sadegh15khedry/MovieRecommendationSystem.git cd Movie-Recommendation-System-Using-Collaborative-Filtering -
Install the required libraries using the environment.yml file using conda:
conda env create -f environment.yml
-
Download the movieLens datasets (
movies.csv,tags.csv,ratings.csv) and update the path to them in the code. -
Run the
recommendation_system.ipynbnotebook to generate movie recommendations.
- Load the datasets (
movies.csv,tags.csv,ratings.csv) using pandas. - Select relevant columns (
tag_df,rating_df,movie_df) for further analysis. - Perform exploratory data analysis (EDA) to understand data shapes, missing values, duplicates, and basic statistics.
- Merge
rating_dfandmovie_dfonmovieIdto create a combined DataFrame (df). - Aggregate ratings to find movies with more than 100 ratings (
agg_df). - Merge
dfwithagg_dfto filter out less popular movies (df_gt100).
- Create a user-movie matrix (
user_movie_matrix) usingpivot_table, where rows represent users, columns represent movies, and values represent ratings.
- Normalize
user_movie_matrix(matrix_norm) by subtracting the mean rating of each user. - Calculate cosine similarities (
user_similarityandmovie_similarity) based onmatrix_normto find similar users and movies.
- Select a user (
picked_userId) and set up variables (number_of_simlar_users,user_similarity_threshold). - Find similar users (
similar_users) based on a similarity threshold. - Identify movies watched by the selected user (
picked_user_watched) and similar users (similar_users_movies). - Calculate item scores (
item_score) based on weighted sums of ratings from similar users. - Sort and print top recommended movies (
ranked_item_score) based on their scores.
- The script outputs top recommended movies for a selected user (
picked_userId) based on collaborative filtering. - Evaluation metrics (e.g., precision, recall) and visualizations (e.g., heatmap of similarity matrices) can be added for performance analysis.
- Implement evaluation metrics to quantify the performance of the recommendation system.
- Optimize code efficiency for larger datasets and real-time recommendations.
- Incorporate content-based filtering or hybrid approaches for improved recommendation accuracy.
- Sadegh Khedry
This project is licensed under the Apache-2.0 License - see the LICENSE.md file for details.