Skip to content

Every cyclist and stage of the Tour de France (up to including 2025) in four CSV files.

License

Notifications You must be signed in to change notification settings

thomascamminady/LeTourDataSet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

159 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Le Tour de France Data Set Logo

Every cyclist and stage of the Tour de France (up to including 2025) in four CSV files.

If you use pandas, just get the data via:

import pandas as pd

# Men's Tour de France data
df_men_riders = pd.read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/men/TDF_Riders_History.csv")
df_men_stages = pd.read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/men/TDF_Stages_History.csv")

# Women's Tour de France data
df_women_riders = pd.read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/women/TDFF_Riders_History.csv")
df_women_stages = pd.read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/women/TDFF_Stages_History.csv")

Distance and winner average pace

If you use R instead of python, you can run:

library(readr)

# Men's Tour de France data
df_men_riders <- read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/men/TDF_Riders_History.csv")
df_men_stages <- read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/men/TDF_Stages_History.csv")

# Women's Tour de France data
df_women_riders <- read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/women/TDFF_Riders_History.csv")
df_women_stages <- read_csv("https://raw.githubusercontent.com/thomascamminady/LeTourDataSet/master/data/women/TDFF_Stages_History.csv")

Quick Start

# Install dependencies
make install

# Download latest data (2025 men's and 2024 women's included)
make update

# Generate plots
make plot

# Or do both
make all

Data Structure

The repository is organized as follows:

data/
├── men/                    # Men's Tour de France data
│   ├── TDF_Riders_History.csv
│   ├── TDF_Stages_History.csv
│   └── TDF_All_Rankings_History.csv
├── women/                  # Women's Tour de France data
│   ├── TDFF_Riders_History.csv
│   ├── TDFF_Stages_History.csv
│   └── TDFF_All_Rankings_History.csv
└── plots/                  # Generated visualizations
    ├── TDF_Distance_And_Pace.png
    └── TDFF_Distance_And_Pace.png

Le Tour de France Femmes avec Zwift

As of 2022, the women's Tour de France was relaunched as "Le Tour de France Femmes avec Zwift". The data is available on the official tour website and is included in this dataset with complete coverage through 2025.

Data Coverage

  • Men's Tour de France: 1903-2024 (complete historical coverage)
  • Women's Tour de France: 2022-2025 (complete since relaunch)

Available Files

Men's Data (data/men/)

  • TDF_Riders_History.csv: Every cyclist and winner data
  • TDF_Stages_History.csv: Stage-by-stage information
  • TDF_All_Rankings_History.csv: Comprehensive rankings data

Women's Data (data/women/)

  • TDFF_Riders_History.csv: Every cyclist and winner data
  • TDFF_Stages_History.csv: Stage-by-stage information
  • TDFF_All_Rankings_History.csv: Comprehensive rankings data

How to Run

Using Make (Recommended)

# Install dependencies
make install

# Download latest data
make update

# Generate plots
make plot

# Run everything
make all

# See all available commands
make help

Manual Execution

# Install environment
poetry install

# Download data
cd scripts && poetry run python download_data.py

# Generate plots
cd scripts && poetry run python generate_plots.py

Development

# Install development dependencies
make install

# Run tests
make test

# Format code
make format

# Check code quality
make lint

# Check CSV data integrity
make check-csv

Data Protection

This repository includes automated data protection to ensure historical data integrity:

  • Row Protection: CSV files can only grow (no data deletion)
  • Column Protection: New columns allowed, existing columns protected
  • Data Integrity: Existing data cannot be modified
  • Automatic Validation: GitHub Actions check every pull request

Recent Updates (2025)

  • ✅ Added 2025 men's Tour de France data
  • ✅ Added 2024 women's Tour de France data
  • ✅ Reorganized repository with modern Python packaging
  • ✅ Renamed classes: DownloaderScraper, PlotterVisualizer
  • ✅ Added comprehensive Makefile for easy data management
  • ✅ Implemented CSV data protection workflows
  • ✅ Organized data into men/women subfolders

Disclaimer

For issues with this data set, see the Issues tab. Some entries may be incorrect due to source data issues on the official websites. When discrepancies are found, they typically stem from the original letour.fr or letourfemmes.fr websites.

Annual Update Workflow

For maintainers updating the dataset with new Tour de France data:

Simple One-Command Update (Recommended)

make update

This comprehensive command will:

  1. 📥 Download the latest Tour de France data from official sources
  2. 🔧 Post-process and sort all data files
  3. 🩹 Automatically fix any missing riders history data
  4. 🛡️ Verify CSV file integrity
  5. 📊 Generate updated plots and visualizations

After running this command, simply review the changes and commit/push if everything looks correct.

Manual Step-by-Step (If Needed)

make install              # Install dependencies
make download-only        # Download new data only
make postprocess         # Sort and organize data
make fix-riders-history  # Fix any missing general classification data
make check-csv           # Verify data integrity
make plot                # Generate plots

The workflow is designed to be robust and handle different data structures that may appear in future years.

Legacy Code

This code has been completely rewritten for 2025. The previous code and output are available in the legacy repository. See legacy/README.txt for historical context.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes (data will be automatically validated)
  4. Submit a pull request

The CSV data protection system will automatically verify that:

  • No historical data is deleted or modified
  • Only new data additions are permitted
  • Data integrity is maintained across all changes

About

Every cyclist and stage of the Tour de France (up to including 2025) in four CSV files.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •