Recommendation System using Collaborative Filtering

Small Group Project for DATA 4380 course at The University of Texas at Arlington.
This repository holds an attempt to build a Recommendation system model using three different datasets from Kaggle.

Overview

The goal is to build a collaborative-filtering recommendation system model that is capable to take in different datasets. The model is able to predict ratings for items a user has not bought/seen/or used. The 10 items with the highest predicted ratings can then be recommended to the user.

Summary of Workdone

Data

Data:
- Datasets
Preprocessing / Clean up:
- In preparing the data, we created a function that takes in the directory of the file/ dataset and returns a dataframe.
- The function was modified in a way that it could be used for any datasets, that has the attributes of user_id, ratings, item_id.
- For the item_id column the header was originally book_id, movie_id, product_id for the datasets we worked with.
- We then created a function that will help format the dataframe Then, we also created a function that could format the dataframe into the necessary x,y inputs for the model.
- Within this function we created a 90-10 split for data training and data validation, and also the data was randomized/ Shuffle.

Problem Formulation

Recommandation System model breakdown:

Input: x, y vectors for [user,items] and modified [ratings]
Output: Probability that find other items that those users or similar users also liked
Adam Optimizer used

Training

In the first attempt, trained the model for 5 epochs, (for all the dataset)
In the second attempt, trained the model for 10 epochs (for all the datasets)
We used MoldelCheckpoint from Keras for training callbacks, checkpoint was set to save model weights at each epochs if loss improved from the epoch before.

Performance Comparison

Losses at 5 epochs and then 10 epochs - Movies

Losses at 5 epochs and then 10 epochs - Books

Future Work

For further, we plan to train the models for longer, for more accuracy.
Although, we were able to make predictions when we initially trained for the separate datasets, but while incorporating it to function we are having some problem with model saving weights, so we would work on it in future.
We also plan to work on evaluating RMSE (Root Mean Square Error for the predictions we made, to reduce bad predictions or outliers.

How to reproduce results

Overview of files in repository

modules
- makeRec: function to make 10 recommandations of dataset items given random user
- prepData: functions that help create the loading and pre-processing of data
- recmodel: recommandation model architecture
book_logs in logs folder
- Saved model at each epoch if loss improved from prev epoch for Book dataset
movies_logs in loss folder
- Saved model at each epoch if loss improved from prev epoch for Movies dataset
books_history.pkl
- Saved history of losses at each epoch for Books Dataset under log_losses
movies_history.pkl
- Saved history of losses at each epoch for Movie Dataset under log_losses
CFRC_BooksF.ipynb
- Breakdown of:
  - Loading data
  - Building model
  - Training model
  - Evaluating losses
  - Example of 10 reccomndations
CFRC_MoviesF.ipynb
- With same breakdown as Books notebook.
CFRC_Amazon_Still_Working.ipynb
- With same breakdown as Books notebook.

Software Setup

Packages used in notebook: numpy, pandas, matplotlib, tenserflow, sklearn, scipy cv2, os, PIL, pickle
Tensorflow-metal PluggableDevice was installed to accelerate training with Metal on Mac GPUs using this link.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
datasets		datasets
logs		logs
loss_logs		loss_logs
modules		modules
repopics		repopics
.DS_Store		.DS_Store
CFRC_Amazon_Still_Working.ipynb		CFRC_Amazon_Still_Working.ipynb
CFRC_BooksF.ipynb		CFRC_BooksF.ipynb
CFRC_MoviesF.ipynb		CFRC_MoviesF.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommendation System using Collaborative Filtering

Overview

Summary of Workdone

Data

Problem Formulation

Training

Performance Comparison

Future Work

How to reproduce results

Overview of files in repository

Software Setup

References

About

Uh oh!

Releases

Packages

Languages

citgua/CF_RecSystems

Folders and files

Latest commit

History

Repository files navigation

Recommendation System using Collaborative Filtering

Overview

Summary of Workdone

Data

Problem Formulation

Training

Performance Comparison

Future Work

How to reproduce results

Overview of files in repository

Software Setup

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages