docs: add supported features, parity checks, and perf sections to all model READMEs (#2354) by HemantSudarshan · Pull Request #2420 · pytorch/torchtitan

HemantSudarshan · 2026-02-22T09:44:22Z

Summary

Closes #2354 — "Better document for model perf / supported techniques / parity checks"

This PR adds standardized documentation to every model and experiment README, addressing the three requirements from user feedback:

Supported features — each model README now has a comprehensive feature table
Parity check methodology — each model README documents how to verify numerical equivalence against HuggingFace baselines
Performance numbers — Llama 3 includes full H100/H200 benchmark tables; other models have honest placeholders with links to the benchmarks submission guide

All feature claims have been verified against source code across three independent review cycles (parallelize functions, TrainSpec registrations, model configs, runtime guards).

What Changed (15 files, 693 additions, 42 deletions)

Model READMEs (7 files)

Model	What was added
Llama 3 (new file)	Full features table (14 rows), 8 model variants, 6 benchmark tables (8B/70B/405B on H100), Async TP speedups, KL divergence parity results (~1e-13), loss convergence matrix
Llama 3 FT (new file)	Inherits Llama 3 features + TorchFT-specific additions (fault tolerance, DiLoCo, semi-sync)
Llama 4 (rewritten)	Features table (16 rows), 6 variants (standard + iRoPE), MoE-specific parity notes
DeepSeek-V3 (rewritten)	Features table (17 rows), MLA-aware TP details, 5 variants (debugmodel→671B), Float8 rowwise-only caveat
Qwen3 (rewritten)	Features table (17 rows), dense + MoE variants, QK-norm TP note, weight tying details
Flux (rewritten)	Features table (17 rows), diffusion-specific parity methodology (MSE, visual comparison), custom trainer details
GPT-OSS (rewritten)	Features table (21 rows), FlexAttention/sliding window/YaRN details, grouped MM, MoE load balancing

Top-level Feature Matrix (1 file)

torchtitan/models/README.md — Added a 20-row × 7-model comparison table covering FSDP, HSDP, TP, PP, CP, EP, ETP, DDP, AC, torch.compile, Float8, MXFP8, Async TP, Loss Parallel, HF Interop, DualPipeV, Validation, MoE, Custom Trainer, Benchmarks Published.

Experiment READMEs (6 files)

Added parity checks and performance sections to: autoparallel, compiler_toolkit, simple_fsdp, torchcomms, transformers_modeling_backend, vlm.

Tests README (1 file)

tests/README.md — Added a Parity Testing section that addresses the original complaint: "tests directory doesn't seem to have [parity checks]". Points users to scripts/checkpoint_conversion/numerical_tests_example.py with full instructions.

Verification Methodology

Every feature claim was cross-referenced against the actual source code:

Verification Target	Source Code Checked
Parallelism support (TP, PP, CP, EP)	`parallelize_*.py`, `TrainSpec.pipelining_fn`
torch.compile	`apply_compile()` call presence in parallelize functions
Float8 / MXFP8	`model.converters` config, `Float8ColwiseParallel`/`MXLinear` usage
Async TP	`enable_async_tp` parameter value in parallelize calls
DualPipeV	`get_dual_pipe_v_flag()` runtime guard (requires `pp_enabled AND ep_enabled`)
Validation	`build_validator_fn` wiring in TrainSpec
Parity results	`scripts/checkpoint_conversion/numerical_tests_example.py` actual output

Specific corrections made during verification:

DeepSeek-V3 PP: changed ❌ → ✅ (has pipeline_llm)
Qwen3 PP: changed ✅ → ❌ (pipelining_fn=None)
GPT-OSS torch.compile: changed ✅ → ❌ (apply_compile() not called)
GPT-OSS Async TP: changed ✅ → ❌ (hardcoded enable_async_tp=False)
GPT-OSS/Qwen3 DualPipeV: changed ✅ → ❌ (requires PP, which both lack)
Qwen3 Validation: changed ❌ → ✅ (build_validator is wired)
MXFP8: standardized to ⚠️ Untested for DeepSeek-V3, Llama 4, Qwen3 (generic converter available but not validated)

What This PR Does NOT Do

No code changes — documentation only, zero risk to training code
No new test scripts — documents existing numerical_tests_example.py rather than creating new ones
No fabricated benchmarks — only Llama 3 has published numbers; all other models honestly state "no benchmarks published yet"

How to Review

Feature tables: Spot-check any row against the corresponding parallelize_*.py or __init__.py TrainSpec
Parity sections: Verify the methodology matches scripts/checkpoint_conversion/README.md
Top-level matrix: Cross-check any cell against the individual model README
tests/README.md: Confirm the Parity Testing section correctly describes numerical_tests_example.py

meta-cla · 2026-02-22T09:44:27Z

Hi @HemantSudarshan!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-02-22T10:30:17Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

… model READMEs Addresses pytorch#2354 — Better document for model perf / supported techniques / parity checks. Changes: - Add comprehensive supported features tables to all 7 model READMEs (llama3, llama3_ft, llama4, deepseek_v3, qwen3, flux, gpt_oss) - Add cross-model feature matrix to torchtitan/models/README.md - Add parity check methodology sections with HF baseline comparison - Add performance sections with Llama 3 H100/H200 benchmarks - Add parity/performance sections to 6 experiment READMEs - Add Parity Testing section to tests/README.md pointing to scripts/checkpoint_conversion/numerical_tests_example.py All feature claims verified against source code (parallelize_fn, TrainSpec, model configs). Three review cycles performed.

HemantSudarshan requested review from fegin, tianyu-l, wconstab and wwwjn as code owners February 22, 2026 09:44

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 22, 2026

HemantSudarshan force-pushed the docs/model-perf-parity-checks-2354 branch from 81b9c05 to cd8841c Compare February 22, 2026 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add supported features, parity checks, and perf sections to all model READMEs (#2354)#2420

docs: add supported features, parity checks, and perf sections to all model READMEs (#2354)#2420
HemantSudarshan wants to merge 1 commit intopytorch:mainfrom
HemantSudarshan:docs/model-perf-parity-checks-2354

HemantSudarshan commented Feb 22, 2026

Uh oh!

meta-cla bot commented Feb 22, 2026

Uh oh!

meta-cla bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HemantSudarshan commented Feb 22, 2026

Summary

What Changed (15 files, 693 additions, 42 deletions)

Model READMEs (7 files)

Top-level Feature Matrix (1 file)

Experiment READMEs (6 files)

Tests README (1 file)

Verification Methodology

What This PR Does NOT Do

How to Review

Uh oh!

meta-cla bot commented Feb 22, 2026

Action Required

Process

Uh oh!

meta-cla bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant