A Python library for the DecryptX Round 3 Data Cleaning Contest.
Install directly from GitHub:
pip install git+https://github.com/keanesc/decryptx-helper.gitOr for development:
git clone https://github.com/keanesc/decryptx-helper.git
cd decryptx-helper
pip install -e .The library automatically downloads the FIFA dataset from the DecryptX server when you call load_data(). The dataset is cached locally in ~/.cache/decryptx/ and refreshed every 24 hours.
You can customize the dataset location using an environment variable:
DECRYPTX_DATA_PATH: Direct path to a local dataset file (bypasses download)
Example:
# Use a local file (no download)
export DECRYPTX_DATA_PATH="/path/to/your/fifa_raw_data.csv"Google Colab Note: The dataset will be automatically downloaded on first use. No manual upload required!
from decryptx import login, load_data, submit
# 1. Login with your team credentials
session = login(team_name="YourTeamName", password="your_password")
# 2. Load the raw FIFA dataset
df = load_data()
# 3. Clean your data (YOUR WORK GOES HERE)
df_clean = your_cleaning_function(df)
# 4. Submit your cleaned dataset
# This will automatically split data, train a fixed model, evaluate performance, and submit the score.
result = submit(session, df_clean)
print(f"Remaining attempts: {result['remainingAttempts']}/5")Authenticate with the DecryptX server.
Returns: Session dictionary containing teamId, sessionId, and qualification status.
Load the raw FIFA dataset that needs cleaning.
Returns: pandas DataFrame with the raw data.
Submit your cleaned dataset for evaluation.
This function:
- Splits your data into train/test sets (fixed random seed)
- Trains a standardized Random Forest model
- Evaluates RMSE on the test set
- Submits the score to the leaderboard
Returns: Submission result dictionary.
Submit your score to the leaderboard.
Returns: Submission result with remainingAttempts and status.
-
Fixed Parameters: The train/test split uses
random_state=42andtest_size=0.2. Do not modify these. -
5 Submission Limit: You have exactly 5 submission attempts total (lifetime limit).
-
1 Minute Cooldown: Wait at least 1 minute between submissions.
-
RMSE Scoring: Lower RMSE is better. The target is the player's Overall Rating (OVA).
-
Data Cleaning Focus: The competition is about data cleaning, not model architecture. A simple model on well-cleaned data often beats a complex model on dirty data.
- Handle missing values appropriately
- Parse numeric values from strings (e.g., "€103.5M" → 103500000)
- Handle height/weight formats (e.g., "170cm" → 170)
- Remove or encode special characters
- Consider feature engineering from the available columns
MIT License