Small Group Project for DATA 4380 course at The University of Texas at Arlington.
This repository holds an attempt to build a Recommendation system model using three different datasets from Kaggle.
- The goal is to build a collaborative-filtering recommendation system model that is capable to take in different datasets. The model is able to predict ratings for items a user has not bought/seen/or used. The 10 items with the highest predicted ratings can then be recommended to the user.
-
Data:
-
Preprocessing / Clean up:
- In preparing the data, we created a function that takes in the directory of the file/ dataset and returns a dataframe.
- The function was modified in a way that it could be used for any datasets, that has the attributes of user_id, ratings, item_id.
- For the item_id column the header was originally book_id, movie_id, product_id for the datasets we worked with.
- We then created a function that will help format the dataframe Then, we also created a function that could format the dataframe into the necessary x,y inputs for the model.
- Within this function we created a 90-10 split for data training and data validation, and also the data was randomized/ Shuffle.
Recommandation System model breakdown:
-
Input: x, y vectors for [user,items] and modified [ratings]
-
Output: Probability that find other items that those users or similar users also liked
-
Adam Optimizer used
-
In the first attempt, trained the model for 5 epochs, (for all the dataset)
-
In the second attempt, trained the model for 10 epochs (for all the datasets)
-
We used MoldelCheckpoint from Keras for training callbacks, checkpoint was set to save model weights at each epochs if loss improved from the epoch before.
- Losses at 5 epochs and then 10 epochs - Movies
- For further, we plan to train the models for longer, for more accuracy.
- Although, we were able to make predictions when we initially trained for the separate datasets, but while incorporating it to function we are having some problem with model saving weights, so we would work on it in future.
- We also plan to work on evaluating RMSE (Root Mean Square Error for the predictions we made, to reduce bad predictions or outliers.
-
- makeRec: function to make 10 recommandations of dataset items given random user
- prepData: functions that help create the loading and pre-processing of data
- recmodel: recommandation model architecture
-
- Saved model at each epoch if loss improved from prev epoch for Book dataset
-
- Saved model at each epoch if loss improved from prev epoch for Movies dataset
-
- Saved history of losses at each epoch for Books Dataset under log_losses
-
- Saved history of losses at each epoch for Movie Dataset under log_losses
-
- Breakdown of:
- Loading data
- Building model
- Training model
- Evaluating losses
- Example of 10 reccomndations
- Breakdown of:
-
- With same breakdown as Books notebook.
-
CFRC_Amazon_Still_Working.ipynb
- With same breakdown as Books notebook.
- Packages used in notebook: numpy, pandas, matplotlib, tenserflow, sklearn, scipy cv2, os, PIL, pickle
- Tensorflow-metal PluggableDevice was installed to accelerate training with Metal on Mac GPUs using this link.


