This repository contains a hotel recommendation system built using multiple machine learning methods applied to the Expedia Hotel Recommendations dataset. The system predicts user hotel interactions and compares different approaches to making recommendations.
- Ritvik Roy
- Nihar Shah
- Sakshar Bhardwaj
We implemented and evaluated a set of recommendation algorithms to predict user preferences for hotel clusters based on past behavior. Because the full Expedia dataset is very large, the analysis focuses on a filtered subset for the Czech Republic, which allows for meaningful pattern discovery while keeping computation feasible.
The project includes both classical and modern methods for recommendation and provides quantitative results and visual insights for comparison.
The following approaches were implemented:
-
User–User Collaborative Filtering
Computes similarity between users based on their hotel interaction patterns and recommends hotels using nearest neighbors. -
Item–Item Collaborative Filtering
Computes similarity between hotel clusters to recommend hotels that are frequently interacted with together. -
Singular Value Decomposition (SVD)
Uses dimensionality reduction to extract latent factors from the user–item interaction matrix, capturing hidden user and item properties. -
Matrix Factorization
Uses stochastic gradient descent to learn low-rank user and item latent matrices that approximate the interaction matrix. -
Neural Network Recommendation
Implements a neural model with learnable user and item embeddings in PyTorch to capture nonlinear interactions.
The dataset used is the Expedia Hotel Recommendations dataset, available on Kaggle:
https://www.kaggle.com/c/expedia-hotel-recommendations
Because of the full dataset’s size, we filter the training data to interactions from the Czech Republic. The filtered dataset is used for all analysis and model training.
The repository includes visualizations for:
- User–user and item–item similarity matrices
- PCA plots of latent factors from SVD
- Training loss curves for matrix factorization and neural network models
- Embedding visualizations for model interpretation
The project uses:
- Python for data processing and model implementation
- NumPy and Pandas for data manipulation
- Scikit-learn for basic similarity and preprocessing
- Matplotlib for visualization
- PyTorch for the neural recommendation model
Potential extensions include:
- Adding ranking metrics such as Recall@K and NDCG
- Incorporating metadata for hybrid recommendation models
- Deploying the system as an API
- Scaling to larger datasets with distributed systems