Skip to content

Conversation

@RodriguesRBruno
Copy link
Contributor

This PR adds support to run the MedPerf Data Preparation step in Airflow, in addition to the old format of having a single Data Preparation container. Airflow workflows may be defined by using simplified YAML files specifying the data preparation steps. Steps must be containers, but different steps may use different container images. Three examples of data preparation workflows are added to the examples directory: chestxray_tutorial/data_preparation_workflow (modified version of the Chest X Ray example from the MedPerf documentation), RANO/data_preparation_workflow (modified version of the RANO data preparation MLCube) and examples/HEMnet/data_preparator (modified version of the HEMnet data preparation code to run in Airflow and MedPerf).

This PR also updates various MedPerf dependencies for compatibility with the newly added Airflow 3.0.1 dependency. Notably, Pydantic has been updated to v2.11.0, with necessary code changes already implemented.

@RodriguesRBruno RodriguesRBruno marked this pull request as ready for review December 18, 2025 13:05
@RodriguesRBruno RodriguesRBruno requested a review from a team as a code owner December 18, 2025 13:05
@RodriguesRBruno
Copy link
Contributor Author

The last few pushes update the Airflow dependency to 3.1.5, the most recent version at this time

@hasan7n hasan7n changed the base branch from main to airflow January 5, 2026 13:20
@hasan7n hasan7n merged commit c15d35e into mlcommons:airflow Jan 5, 2026
8 of 10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants