Skip to content

docs: add supported features, parity checks, and perf sections to all model READMEs (#2354)#2420

Open
HemantSudarshan wants to merge 1 commit intopytorch:mainfrom
HemantSudarshan:docs/model-perf-parity-checks-2354
Open

docs: add supported features, parity checks, and perf sections to all model READMEs (#2354)#2420
HemantSudarshan wants to merge 1 commit intopytorch:mainfrom
HemantSudarshan:docs/model-perf-parity-checks-2354

Conversation

@HemantSudarshan
Copy link

Summary

Closes #2354"Better document for model perf / supported techniques / parity checks"

This PR adds standardized documentation to every model and experiment README, addressing the three requirements from user feedback:

  1. Supported features — each model README now has a comprehensive feature table
  2. Parity check methodology — each model README documents how to verify numerical equivalence against HuggingFace baselines
  3. Performance numbers — Llama 3 includes full H100/H200 benchmark tables; other models have honest placeholders with links to the benchmarks submission guide

All feature claims have been verified against source code across three independent review cycles (parallelize functions, TrainSpec registrations, model configs, runtime guards).


What Changed (15 files, 693 additions, 42 deletions)

Model READMEs (7 files)

Model What was added
Llama 3 (new file) Full features table (14 rows), 8 model variants, 6 benchmark tables (8B/70B/405B on H100), Async TP speedups, KL divergence parity results (~1e-13), loss convergence matrix
Llama 3 FT (new file) Inherits Llama 3 features + TorchFT-specific additions (fault tolerance, DiLoCo, semi-sync)
Llama 4 (rewritten) Features table (16 rows), 6 variants (standard + iRoPE), MoE-specific parity notes
DeepSeek-V3 (rewritten) Features table (17 rows), MLA-aware TP details, 5 variants (debugmodel→671B), Float8 rowwise-only caveat
Qwen3 (rewritten) Features table (17 rows), dense + MoE variants, QK-norm TP note, weight tying details
Flux (rewritten) Features table (17 rows), diffusion-specific parity methodology (MSE, visual comparison), custom trainer details
GPT-OSS (rewritten) Features table (21 rows), FlexAttention/sliding window/YaRN details, grouped MM, MoE load balancing

Top-level Feature Matrix (1 file)

torchtitan/models/README.md — Added a 20-row × 7-model comparison table covering FSDP, HSDP, TP, PP, CP, EP, ETP, DDP, AC, torch.compile, Float8, MXFP8, Async TP, Loss Parallel, HF Interop, DualPipeV, Validation, MoE, Custom Trainer, Benchmarks Published.

Experiment READMEs (6 files)

Added parity checks and performance sections to: autoparallel, compiler_toolkit, simple_fsdp, torchcomms, transformers_modeling_backend, vlm.

Tests README (1 file)

tests/README.md — Added a Parity Testing section that addresses the original complaint: "tests directory doesn't seem to have [parity checks]". Points users to scripts/checkpoint_conversion/numerical_tests_example.py with full instructions.


Verification Methodology

Every feature claim was cross-referenced against the actual source code:

Verification Target Source Code Checked
Parallelism support (TP, PP, CP, EP) parallelize_*.py, TrainSpec.pipelining_fn
torch.compile apply_compile() call presence in parallelize functions
Float8 / MXFP8 model.converters config, Float8ColwiseParallel/MXLinear usage
Async TP enable_async_tp parameter value in parallelize calls
DualPipeV get_dual_pipe_v_flag() runtime guard (requires pp_enabled AND ep_enabled)
Validation build_validator_fn wiring in TrainSpec
Parity results scripts/checkpoint_conversion/numerical_tests_example.py actual output

Specific corrections made during verification:

  • DeepSeek-V3 PP: changed ❌ → ✅ (has pipeline_llm)
  • Qwen3 PP: changed ✅ → ❌ (pipelining_fn=None)
  • GPT-OSS torch.compile: changed ✅ → ❌ (apply_compile() not called)
  • GPT-OSS Async TP: changed ✅ → ❌ (hardcoded enable_async_tp=False)
  • GPT-OSS/Qwen3 DualPipeV: changed ✅ → ❌ (requires PP, which both lack)
  • Qwen3 Validation: changed ❌ → ✅ (build_validator is wired)
  • MXFP8: standardized to ⚠️ Untested for DeepSeek-V3, Llama 4, Qwen3 (generic converter available but not validated)

What This PR Does NOT Do

  • No code changes — documentation only, zero risk to training code
  • No new test scripts — documents existing numerical_tests_example.py rather than creating new ones
  • No fabricated benchmarks — only Llama 3 has published numbers; all other models honestly state "no benchmarks published yet"

How to Review

  1. Feature tables: Spot-check any row against the corresponding parallelize_*.py or __init__.py TrainSpec
  2. Parity sections: Verify the methodology matches scripts/checkpoint_conversion/README.md
  3. Top-level matrix: Cross-check any cell against the individual model README
  4. tests/README.md: Confirm the Parity Testing section correctly describes numerical_tests_example.py

@meta-cla
Copy link

meta-cla bot commented Feb 22, 2026

Hi @HemantSudarshan!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 22, 2026
@meta-cla
Copy link

meta-cla bot commented Feb 22, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

… model READMEs

Addresses pytorch#2354 — Better document for model perf / supported techniques / parity checks.

Changes:
- Add comprehensive supported features tables to all 7 model READMEs
  (llama3, llama3_ft, llama4, deepseek_v3, qwen3, flux, gpt_oss)
- Add cross-model feature matrix to torchtitan/models/README.md
- Add parity check methodology sections with HF baseline comparison
- Add performance sections with Llama 3 H100/H200 benchmarks
- Add parity/performance sections to 6 experiment READMEs
- Add Parity Testing section to tests/README.md pointing to
  scripts/checkpoint_conversion/numerical_tests_example.py

All feature claims verified against source code (parallelize_fn,
TrainSpec, model configs). Three review cycles performed.
@HemantSudarshan HemantSudarshan force-pushed the docs/model-perf-parity-checks-2354 branch from 81b9c05 to cd8841c Compare February 22, 2026 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better document for model perf / supported techniques / parity checks

1 participant