This project focuses on building and evaluating recommender systems that generate personalized suggestions for users. It covers the full pipeline of recommendation model development, including:
- Data preparation – transforming interaction logs (users, items, timestamps, ratings, etc.) into a structured dataset.
- Model training – applying classical algorithms (e.g., Matrix Factorization, kNN) and modern deep learning–based methods (e.g., Transformers like SASRec).
- Evaluation – assessing models with metrics such as MAP@k and Serendipity.
- Deployment & usage – generating top-k recommendations for users while supporting filtering, ranking, and interpretability.
As an input both models take user sequences, containing previous user interaction history. Item embeddings from these sequences are fed to transformer blocks with multi-head self-attention and feedforward neural network as main components. After one or several stacked attention blocks, resulting user sequence latent representation is used to predict targets items.
SASRec is a transformer-based sequential model with unidirectional attention mechanism and "Shifted Sequence" training objective. Resulting user sequence latent representation is used to predict all items in user sequence at each sequence position where each item prediction is based only on previous items.
BERT4Rec is a transformer-based sequential model with bi-directional attention mechanism and "Item Masking" (same as "MLM") training objective. Resulting user sequence latent representation is used to predict masked items.
| Difference | SASRec | BERT4Rec |
|---|---|---|
| Training objective | Shifted sequence target | Item masking target |
| Attention | Uni-directional | Bi-directional |
| Transformer block | Check the details below | Check the details below |
| Loss in original paper | Binary cross-entropy (BCE) with 1 negative per positive | Cross-entropy (Softmax) on full items catalog |
The Indonesia Tourism Destination Dataset contains several tourist attractions in 5 major cities in Indonesia, namely Jakarta, Yogyakarta, Semarang, Bandung, Surabaya. This dataset is used in the Capstone Project Bangkit Academy 2021 called GetLoc. This dataset also consists of 4 files, namely:
tourism_ with _id.csv: contains information on tourist attractions in 5 major cities in Indonesia totaling ~400user.csv: contains dummy user data to make recommendation features based on usertourism_rating.csv: contains 3 columns, namely the user, the place, and the rating given, serves to create a recommendation system based on the ratingpackage_tourism.csv: contains recommendations for nearby places based on time, cost, and rating
Mean Average Precision at k (MAP@k) Average Precision (AP) measures the precision of recommendations while considering their order, and MAP@k is the mean value of AP across all users.
Serendipity measured as the average relevance of recommended items weighted by their dissimilarity from a user’s past interactions.
| Model | MAP@10 | Serendipity@10 |
|---|---|---|
| Popular (Base Model) | 0.006990 | 0.000007 |
| Ease (Base Model) | 0.005978 | 0.000296 |
| BERT4Rec Softmax | 0.006615 | 0.000337 |
| SASRec Softmax | 0.008706 | 0.000334 |
| SASRec BCE | 0.007420 | 0.000211 |
| SASRec GBCE | 0.007404 | 0.000261 |
Results:
- MAP@10 (Mean Average Precision): SASRec Softmax achieves the highest MAP@10 score (0.008706), meaning it is the most effective at ranking relevant recommendations at the top of the list for users. This indicates users are more likely to find items they prefer among the top 10.
- Serendipity: SASRec Softmax also has one of the highest Serendipity@10 scores (0.000334), showing it can recommend items that are both relevant and novel (not just similar to what users have already interacted with). This helps users discover new and interesting destinations.
Based on evaluation metrics, we choose SASRec with Softmax as the recommendation model.
- SASRec original paper: Self-Attentive Sequential Recommendation
- Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?
- gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling
- BERT4Rec original paper: BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
- Comparison of BERT4Rec implementations: A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation
- uv
- Python v3.12.*
- Indonesia Tourism Destination Dataset
uv syncuv run pre-commit installOr you can use the make command:
make initial-setup

