Codeforces Rating Predictor

This repo:

Collects data from codeforces API to train a rating predictor model
Prepares the previous data to be consumed by a decision tree model, generating (features, label) pairs
Trains a catboost model using previously generated data, achieving 65.15 of mean absolute error when predicting your rating 6 months in the future

Data collection

Use the notebook retrieve_data_from_codeforces_api.ipynb to collect data from active codeforces users. Defaults to 1.5k best users + 50 k random users (rating distribution has a long tail, so we add all very high rated users to improve performance for them). The notebook collects submission, delta rating and blogs history. This is the slowest step, taking an entire day to complete. Ask the author for the compressed data if needed.

Data preparation

After collecting the data in the previous step run python generateLabeledData.py to prepare the labels. For each user on each 1st day of each month from Jan 2021 to Nov 2024 we extract features from this user (using the previously collected data) as well as their rating 6 months in the future (30 * 24 * 60 * 60 seconds in the future to be precise). If the user had less than 5 rated contests at this point we DO NOT generate features and just ignore this sample. Otherwise we generate (features, label) pairs and save them in a csv file to be used later. Notice each user may generate 0 (if the user haven't participated in 5 rated contests) up to 47 (features, label) pairs.

Model training

The script training_notebook.ipynb performs the following steps:

User Separation: Reads users from data/users.json and separates them into training (80%) and validation (20%) classes using random selection
Data Loading: Reads labeled data from data/labeled_data.csv
Temporal Separation:
- Training set: 2021-01-01 ≤ Data ≤ 2023-11-01
- Validation set: 2023-01-01 ≤ Data ≤ 2024-11-01
Model Training: Trains a CatBoost regression model with Mean Absolute Error (MAE) loss
Evaluation: Provides MAE evaluation metric
Model Persistence: Saves the trained model for later use

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generateLabeledData.py		generateLabeledData.py
getUserRating.py		getUserRating.py
plot_rating_distribution.ipynb		plot_rating_distribution.ipynb
requirements.txt		requirements.txt
retrieve_data_from_codeforces_api.ipynb		retrieve_data_from_codeforces_api.ipynb
training_notebook.ipynb		training_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codeforces Rating Predictor

Data collection

Data preparation

Model training

About

Uh oh!

Releases

Packages

Languages

License

fbrunodr/CFRatingPredictor

Folders and files

Latest commit

History

Repository files navigation

Codeforces Rating Predictor

Data collection

Data preparation

Model training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages