Skip to content

fbrunodr/CFRatingPredictor

Repository files navigation

Codeforces Rating Predictor

This repo:

  • Collects data from codeforces API to train a rating predictor model
  • Prepares the previous data to be consumed by a decision tree model, generating (features, label) pairs
  • Trains a catboost model using previously generated data, achieving 65.15 of mean absolute error when predicting your rating 6 months in the future

Data collection

Use the notebook retrieve_data_from_codeforces_api.ipynb to collect data from active codeforces users. Defaults to 1.5k best users + 50 k random users (rating distribution has a long tail, so we add all very high rated users to improve performance for them). The notebook collects submission, delta rating and blogs history. This is the slowest step, taking an entire day to complete. Ask the author for the compressed data if needed.

Data preparation

After collecting the data in the previous step run python generateLabeledData.py to prepare the labels. For each user on each 1st day of each month from Jan 2021 to Nov 2024 we extract features from this user (using the previously collected data) as well as their rating 6 months in the future (30 * 24 * 60 * 60 seconds in the future to be precise). If the user had less than 5 rated contests at this point we DO NOT generate features and just ignore this sample. Otherwise we generate (features, label) pairs and save them in a csv file to be used later. Notice each user may generate 0 (if the user haven't participated in 5 rated contests) up to 47 (features, label) pairs.

Model training

The script training_notebook.ipynb performs the following steps:

  1. User Separation: Reads users from data/users.json and separates them into training (80%) and validation (20%) classes using random selection
  2. Data Loading: Reads labeled data from data/labeled_data.csv
  3. Temporal Separation:
    • Training set: 2021-01-01 ≤ Data ≤ 2023-11-01
    • Validation set: 2023-01-01 ≤ Data ≤ 2024-11-01
  4. Model Training: Trains a CatBoost regression model with Mean Absolute Error (MAE) loss
  5. Evaluation: Provides MAE evaluation metric
  6. Model Persistence: Saves the trained model for later use

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published