We bring the spirit of nanogpt-speedrun into the omni-modal world
We have witnessed an impressive improvements on Diffusion models, especially with encoder improvements like REPA/E-REPA/RAE/REG, and also other areas of enhancement. Triggered by the experiment of SpeedRun-DiT, we as an open source community think it might be a worthy attempt to construct a speedrun (in the spirit of the great nanogpt-speedrun and marin-speedrun) for diffusion models and multi-modal generation tasks.
- ImageGen SpeedRun Ruleset : Interested parties should make their submission in accordance to the first version of general ruleset. We are looking at the first submissions targeting two directions: Basic setup with bare minimum ViT and Innovations with external modules(such as RAE/REG/etc...)
- Baseline implementation for speedrun improvement on the external modules : Interested parties should look to the latest SR-DiT implementation for REG for reference if they don't have a perticular reference implementation of a minimum setup in mind.
From the community discussion, it was agreed that the spec that we used for submission metric to merge to one basic rule set while maintaining the flexibility of allowing free-form submission.
For the baseline code base, for the improvement on the external modules (vision encoders, etc), for the initial release the submittor could look to the latest SR-DiT implementation for REG for reference if they don't have a perticular reference implementation of a minimum setup in mind.
For the basic simplistic setup, we call for proposal in lieu of nano-vits.
From the community discussion, we intend to kickstart the Imagegen SpeedRun with two tracks:
- basic track that facilitates rapid iteration on a simple ViT/DiT structured model on limited hardware setting
- external-module track that empowering the recent rapid research on REPA-series of innovations with large size of dataset and lab-level hardware setting.
However it is also agreed that although the two proposed tracks each targets a different direction and scenario, the criteria and metric we use should be largely aligned just with certain caveats for each option. We should avoid diverge of two completely different measurement systems
Note that this effort is purely grassroot with the support of LFAI&Data Foundation for Open Model Initiative, it will undergo some dramatic changes, but we intend to document every dicussion publicly and making the process as transparent as possible
- Assembly of initial technical expert teams
- Draft proposal of the measurement criteria of the initial tracks
- First version of the track requirements published.
- Confirmation of the first round CFP announcement (content, time, place)
- Confirmation of the hardware resources
- Confirmation of the Track review team (for cycle 2026H1)
- Welcome and start review of the first submission