[BC Breaking] Config System Refactor: TOML to Python Dataclass Registry by tianyu-l · Pull Request #2386 · pytorch/torchtitan

tianyu-l · 2026-02-17T09:53:06Z

NOTE: This PR is a large refactor of the codebase. https://github.com/pytorch/torchtitan/releases/tag/v0.2.2 contains a latest release right before this PR is merged.

author's note

This refactor is mainly trying to address two issues:

bad encapsulation: previously a monolithic JobConfig is leaked everywhere
not easy to iterate and experiment on model architecture and training components

The main changes are:

Strict encapsulation, even at the cost of (hopefully temporary) bloated interface when calling subcomponents (e.g. validator). We should try to find the right abstraction on cross-components visibility.
Each Configurable component owns its own Config, which builds the owner component. It achieves modularization via polymorphism and inheritance, both classic concepts in OOP.
- This is partly inspired by repos like AXLearn (in particular, @ruomingp's ML API Styles), github issues (e.g. Is the currnet configuration system over-engineered? #1055), and offline discussions (with @Chillee, @ailzhang etc.).
- Similar functionality can be alternatively achieved by other ways, e.g. _target_ in Hydra, but there are opinions not to couple with Hydra's other offerings. See [Feature request] Use omegaconf or hydra for the config system #1415
Main entry point switches from TOML files to Python functions (a.k.a. config_registry.py in each model).
- TOML has the constraint that everything needs to be registered explicitly before it can be used, e.g. our quantization components need to be registered with string names. Python's language level implicit registration is what we believe to be more minimal, and should be fairly easy to extended/modified to support TOML/YAML when users builds upon / fork torchtitan.
- That said, Python config provides much more power, e.g. one can use arbitrary logic to create (the config of) a component, which is hard to express with TOML/YAML, thus creating extra difficulty when users want to migrate to their own favorite config system. The only thing we can do is to stay conservative on the usage of such power.
We still uses tyro to convert config dataclass to CLI, still with the limitation that users need to construct customized config classes, all the way from root level (Trainer.Config now, JobConfig in the past).
- If CLI is not needed, new trainer (or any high-level) config is not required.
- To support "polymorphic construction" from CLI without the hassle, check out chz.

This PR also

updates the docs -- there might be remaining outdated docs, please raise issues or help fix
moves ft to experiments, continuing the effort in separate out training for fault tolerance #2311

Remaining work

AutoParallel CI failure seems caused by the way RoPE is authored, and needs change in autoparallel. (cc @xmfan)
- being fixed in Fix init_weights crash with aliased buffers and parameters meta-pytorch/autoparallel#321
CompilerToolkit CI failure TypeError: forward() missing 1 required positional argument: 'fwd_rng_state_2' cc @yiming0416 please help take a look
SimpleFSDP CI failure is the same as torch.compile fails with DeepSeekV3 + SimpleFSDP #2312 around dynamic shape for for-loop MoE experts computation. (cc @pianpwk)
- being fixed in moe unbacked symint #2399
Fix integration scripts for MAST, Zoomer, etc.
organize docs from docs/ to subfolders, as we are having more contents to cover in general
generate and store serialized configs (maybe not in the repo)
continue SAC refactor in [SAC] Refactor activation checkpointing to use centralized policy-based approach #2357, but somehow keep the every-other-mm policy (cc @mori360)
refactor RoPE in general, at least resolving the following TODOs in code (cc @shuhuayu)
- having to set / no validation on rope dim == decoder dim // attention n_heads
- consolidate apply_rotary_emb_complex and apply_rotary_emb_single_complex
- address state of Context Parallel & Flex attention #2417

Longer-term issues

More careful design about what to put config vs. runtime build kwargs. (thanks @ailzhang)
ModelSpec not serializable. There may be multiple solutions, but we can potentially consolidate model.py and parallelize.py by
- sharing AC, compile, DP application across all Decoder models
- putting per-module TP/CP/EP sharding plan inside model itself
Right now BaseModel.update_from_config violates encapsulation by passing the Trainer config into Model config. This could be avoided by python logic either in config construction time, or in trainer.
Refactor init_weights into Module.Config instead of staying in Module
- The benefit is that param init can be configurable; o/w we are coupling module implementation and its weight init.
- This may require refactor of current TransformerBlock and its config. E.g. weight_init_std may need to be put in config, with __post_init__ determining its value. (See related complaints / discussions on __post_init__ by chz)

Note to reviewer:
Although I believe the changes in this PR come naturally in a bundle, you may (or may not) find the stack of 16 commits easier to review, as I tried to split the changes in some logic manner. I apologize for the giant PR.

claude-generated summary

Summary

This PR refactors torchtitan's configuration and training infrastructure in 15 incremental, backwards-incompatible commits. The central change replaces TOML config files and a monolithic JobConfig parser with typed Python dataclass configs, a Configurable base class pattern, and a config_registry module per model.

270 files changed, 10,025 insertions, 11,418 deletions.

Motivation

The previous system used TOML files parsed by a custom ConfigManager that layered CLI overrides on top. While simple, this had several friction points:

No type safety at config boundaries. TOML values are strings/ints/floats parsed at runtime. A typo in a key name (e.g., training.stpes) silently becomes a default value.
Flat namespace. All config sections ([model], [training], [optimizer], [checkpoint], ...) lived in a single JobConfig class. Every component received the full JobConfig even when it only needed a few fields.
Experiment extension was ad-hoc. Experiments that needed custom config fields (e.g., SimpleFSDP's compile.graph_passes or FaultTolerant's fault_tolerance.*) required a custom_config_module TOML key and a runtime _merge_configs call to graft new fields onto JobConfig.
Model args were disconnected from model code. A ModelArgs dataclass in args.py defined hyperparameters, but the TrainSpec that bundled model + parallelization + loss was registered separately, with no type-level link between them.

What Changed

1. `Configurable` Base Class

A new Configurable base class (torchtitan/config/configurable.py) establishes a universal pattern:

class Configurable:
    @dataclass(kw_only=True, slots=True)
    class Config:
        def build(self, **kwargs):
            return self._owner(config=self, **kwargs)

    def __init_subclass__(cls, **kwargs):
        # Auto-wires Config.build() -> cls(config=..., **kwargs)
        # Enforces @dataclass(kw_only=True, slots=True) on every Config

Every configurable component (Trainer, model, optimizer, tokenizer, dataloader, checkpoint manager, metrics, validators, quantization converters, ...) follows this pattern. Calling config.build() constructs the owning class.

2. `Trainer.Config` Replaces `JobConfig`

The monolithic JobConfig is replaced by Trainer.Config, a nested dataclass that aggregates typed sub-configs:

class Trainer(Stateful, Configurable):
    @dataclass(kw_only=True, slots=True)
    class Config(Configurable.Config):
        model_spec: ModelSpec | None = None    # set by config_registry, suppressed from CLI
        job: JobConfig = ...
        training: TrainingConfig = ...
        parallelism: ParallelismConfig = ...
        optimizer: OptimizersContainer.Config = ...
        lr_scheduler: LRSchedulersContainer.Config = ...
        checkpoint: CheckpointManager.Config = ...
        dataloader: BaseDataLoader.Config = ...
        metrics: MetricsProcessor.Config = ...
        # ... etc.

Each sub-config is the Config class of the component that consumes it (e.g., CheckpointManager.Config is defined inside CheckpointManager). Components receive only their own config, not the entire training config.

3. `config_registry.py` Replaces TOML Files

Each model defines a config_registry.py with functions that return complete Trainer.Config instances:

# torchtitan/models/llama3/config_registry.py

def llama3_debugmodel() -> Trainer.Config:
    return Trainer.Config(
        job=JobConfig(description="Llama 3 debug training", ...),
        model_spec=model_registry("debugmodel"),
        optimizer=OptimizersContainer.Config(lr=8e-4),
        training=TrainingConfig(local_batch_size=8, seq_len=2048, steps=10),
        dataloader=HuggingFaceTextDataLoader.Config(dataset="c4_test"),
        # ...
    )

def llama3_debugmodel_float8() -> Trainer.Config:
    config = llama3_debugmodel()
    config.model_converters = ModelConvertersContainer.Config(
        converters=[Float8LinearConverter.Config(enable_fsdp_float8_all_gather=True)]
    )
    return config

4. `TrainSpec` -> `ModelSpec`

TrainSpec is renamed to ModelSpec with a narrower scope: it holds only model-specific concerns (model config, parallelization function, loss function, state dict adapter). All training-level concerns (optimizer, LR scheduler, checkpointing, etc.) live in Trainer.Config.

5. Model Configs: Flat `ModelArgs` -> Nested Dataclass Hierarchy

Model hyperparameters move from a flat ModelArgs dataclass into a nested Config hierarchy that mirrors the module tree:

# Before (main): flat args.py
@dataclass
class ModelArgs:
    dim: int = 4096
    n_layers: int = 32
    n_heads: int = 32
    # ... 20+ flat fields

# After (this PR): nested Config in model class
class Llama3Model(Decoder):
    @dataclass(kw_only=True, slots=True)
    class Config(Decoder.Config):
        layer: Llama3TransformerBlock.Config  # contains attention + FFN configs
        rope: RoPE.Config                    # contains RoPE-specific params

6. `train.py` Split

The monolithic train.py (~800 lines) is split into:

train.py (~60 lines): thin entry point that calls ConfigManager.parse_args() and config.build()
trainer.py (~850 lines): the Trainer class with training loop logic

7. Experiment Extension via Inheritance

Experiments extend the config system through dataclass subclassing instead of runtime config merging:

# torchtitan/experiments/simple_fsdp/configs.py
@dataclass(kw_only=True, slots=True)
class SimpleFSDPConfig(Trainer.Config):
    compile: SimpleFSDPCompileConfig = field(default_factory=SimpleFSDPCompileConfig)

Their config_registry.py returns the subclassed config type, and tyro auto-generates CLI parsing for the extended fields.

UX Comparison

Launching Training

# Before (main)
CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.py" ./run_train.sh

# After (this PR)
MODEL=llama3 CONFIG=llama3_8b ./run_train.sh

CLI Overrides

# Before (main)
CONFIG_FILE="./torchtitan/models/llama3/train_configs/debug_model.toml" ./run_train.sh \
  --training.steps 100 --parallelism.tensor_parallel_degree 2

# After (this PR)
./run_train.sh --training.steps 100 --parallelism.tensor_parallel_degree 2
# (defaults to MODEL=llama3, CONFIG=llama3_debugmodel via run_train.sh)

CLI override syntax is unchanged (--section.field value), but tyro now provides typed --help output generated from the dataclass tree.

Defining a New Model Config

# Before: create a new TOML file, copy-paste sections, edit values
cp train_configs/debug_model.toml train_configs/my_experiment.toml
vim train_configs/my_experiment.toml

# After: write a Python function that mutates an existing config
def my_experiment() -> Trainer.Config:
    config = llama3_debugmodel()
    config.training.steps = 100
    config.optimizer.lr = 1e-4
    return config

Adding Experiment-Specific Config Fields

# Before (main): custom_config_module in TOML + runtime _merge_configs
# Requires: TOML key pointing to a Python module, dynamic dataclass creation

# After (this PR): dataclass inheritance
@dataclass(kw_only=True, slots=True)
class MyExperimentConfig(Trainer.Config):
    my_custom_field: str = "default"

Float8 / Quantization Configuration

# Before (main): TOML section
# [quantize.linear.float8]
# enable_fsdp_float8_all_gather = true
# precompute_float8_dynamic_scale_for_fsdp = true

# After (this PR): typed config object
model_converters=ModelConvertersContainer.Config(
    converters=[
        Float8LinearConverter.Config(
            enable_fsdp_float8_all_gather=True,
            precompute_float8_dynamic_scale_for_fsdp=True,
        ),
    ],
),

Limitations and Trade-offs

1. Configs are no longer declarative text files

TOML files were readable by anyone without Python knowledge. The new config_registry functions are Python code, which requires understanding imports, function calls, and dataclass construction. For users who only need to tweak hyperparameters, the CLI override syntax (--training.steps 100) works the same, but understanding the full config requires reading Python.

2. Steeper learning curve for contributors

Adding a new model now requires understanding the Configurable protocol, nested Config dataclass hierarchy, and the config_registry pattern. The old approach of copying a TOML file and editing values had a lower barrier to entry.

3. Config serialization is more complex

TOML files were trivially serializable and diffable. The new system supports to_dict() + JSON serialization, but configs containing callables (e.g., ModelSpec.parallelize_fn) cannot be fully round-tripped. The model_spec field is excluded from serialization and suppressed from CLI parsing.

4. tyro dependency

The CLI parsing now depends on tyro, a third-party library. While tyro is well-maintained and provides typed CLI generation from dataclasses, it is an additional dependency that must be kept compatible with the dataclass patterns used here.

5. `@dataclass(slots=True)` constraints

The Configurable base class enforces @dataclass(kw_only=True, slots=True) on all Config classes. While this provides memory efficiency and prevents accidental attribute assignment, slots=True prevents dynamic attribute addition and makes multiple inheritance with other slotted classes more constrained. Each Config subclass in a deep hierarchy must repeat the @dataclass(kw_only=True, slots=True) decorator.

6. Two-level indirection for model selection

The old system required one identifier: --job.config_file path/to/file.toml. The new system requires two: --module llama3 --config llama3_8b. While this separates model identity from training recipe, it adds an extra argument.

Numerics Verification

All model configs were verified for numerical equivalence against the main branch (commit 10d8a306):

NOTE

only models that can fit on 8 GPUs are tested
only subset of parallelism combination are tested

Model	Status	Notes
llama3 (debugmodel, 8B)	Bitwise match
llama3 (debugmodel_flex_attn)	Bitwise match
qwen3 (0.6B, 1.7B, 32B, MoE debugmodel)	Bitwise match
deepseek_v3 (debugmodel, 16B)	Close (max diff 0.00014)	Pre-existing main branch bug: missing `eps` in final RMSNorm
llama4 debugmodel	Bitwise match	_irope variants don't work on main (FlexAttn `'dict' object has no attribute 'BLOCK_SIZE'`) but now work after this PR
gpt_oss debugmodel	--debug.deterministic causes loss to be NaN; o/w first step loss match, minor difference after (likely caused by flex?)
flux	Bitwise match

Migration Guide

Old (main)	New (this PR)
`CONFIG_FILE="path/to/config.toml" ./run_train.sh`	`MODEL=llama3 CONFIG=llama3_8b ./run_train.sh`
`--job.config_file path.toml`	`--module llama3 --config llama3_8b`
`train_configs/*.toml`	`config_registry.py` functions
`TrainSpec`	`ModelSpec`
`ModelArgs` / `args.py`	Nested `Model.Config` dataclass
`custom_config_module` + `_merge_configs()`	Subclass `Trainer.Config`
`build_model_converters()` free function	`ModelConvertersContainer.Config.build()`
`build_metrics_processor()` free function	`MetricsProcessor.Config.build()`

wconstab · 2026-02-17T16:29:10Z

Have you tried asking claude to help split this into a stack of PRs that can be reviewed/landed independently? I did see your comment/apology to reviewers, but i still think honestly nobody is going to review this PR in its entirety so are you asking for an uncareful scan and a stamp, or do you want to break out important pieces of the code that you want careful review on?

tianyu-l · 2026-02-17T18:49:48Z

@wconstab While I understand how intimidating it could be for reviewing a huge PR, I would like to initially deliver the package as whole instead of letting people only see incremental changes (if it's possible at all). Maybe I would like to achieve

[alignment] most reviewers would pick a single model / config registry and play with it and convince themselves the change looks OK in general. The correctness would be guaranteed by my numerics test and CI (will fix at least the core ones)
[more careful check] If reviewers get aligned and would like to help review line-by-line, I'm more than happy to split into a stack of PRs.

scripts/generate/test_generate.py

tests/integration_tests/features.py

tests/unit_tests/test_dataset_checkpointing.py

torchtitan/components/optimizer.py

acisseJZhong · 2026-02-18T05:45:45Z

torchtitan/distributed/pipeline_parallel.py

    model: nn.Module,
    parallel_dims: ParallelDims,
-    job_config: JobConfig,
+    *,


what's the rule for differentiating the positional args and kw args?

I'm inspired by https://github.com/apple/axlearn/blob/main/docs/ml_api_style.md#avoid-multiple-positional-arguments. Here I'm moving parallel_dims to kwarg as well.

limit the number of positional arguments to <= 1 and use keyword arguments for the rest

what's the reason not making them all kwargs?

I don't know for sure. Likely because for some functions, there would always be a "main" arg that is always there and doesn't introduce ambiguity / error-proneness. E.g. if a function only takes one arg, like parallelize(model), maybe it's fine? You can imagine later on when people adds more and more optional kwargs to the function, the model part doesn't need to be changed.

torchtitan/distributed/dual_pipe_v.py

acisseJZhong · 2026-02-18T06:09:00Z

torchtitan/experiments/rl/unified/__init__.py

-    train_spec: TrainSpec,
+def register_torchtitan_model_from_model_spec(
+    model_spec: ModelSpec,
    model_name: str,


model_name should be part of model_spec

Model name refers to another thing: model_name: Name to register in vLLM (e.g., "Qwen3TorchTitanForCausalLM"). Maybe need a more descriptive name here, but that could be done in a separate PR

torchtitan/models/common/attention.py

torchtitan/trainer.py

wwwjn

Mainly took a look on components, config, experiments/rl, models, train.py and trainer.py

run_train.sh

torchtitan/models/qwen3/config_registry.py

wwwjn · 2026-02-18T07:05:38Z

torchtitan/experiments/rl/unified/__init__.py

-    train_spec: TrainSpec,
+def register_torchtitan_model_from_model_spec(
+    model_spec: ModelSpec,
    model_name: str,


Model name refers to another thing: model_name: Name to register in vLLM (e.g., "Qwen3TorchTitanForCausalLM"). Maybe need a more descriptive name here, but that could be done in a separate PR

torchtitan/models/flux/flux_datasets.py

torchtitan/models/flux/README.md

torchtitan/config/configs.py

torchtitan/experiments/transformers_modeling_backend/README.md

fegin

The overall direction looks good to me, will go over again in more detail. The only uncertainty is what's the best way to handle the shared configurations as mentioning in the review below.

torchtitan/trainer.py

fegin · 2026-02-18T21:31:58Z

torchtitan/components/metrics.py

        parallel_dims: ParallelDims,
+        dump_folder: str = "./outputs",
+        pp_schedule: str = "1F1B",
+        ft_enable: bool = False,
+        ft_replica_id: int = 0,
+        config_dict: dict[str, Any] | None = None,
        tag: str | None = None,


I also am worried about the kwargs being a loophole where people passing configurations around.

An alternative approach is to require each component define these shared configurations and resolve the shared configurations when constructing the root configuration (Trainer.Config).

Yeah, this is the top issue I put in "Longer-term Issues" in PR summary, which I couldn't handle entirely in this initial PR.

First, we need to figure out the boundary between "shared config" and "runtime kwargs". We can use more shared config, but that is "utilizing" (a.k.a. "abusing") the python config power in a way that makes it harder to transform to pure yaml solution, which may be OK.

More importantly, we need to reconsider if the current function calling structure makes sense at all. Current metrics logging is limited and hard to customize -- e.g. in MoE how to log the number of tokens each expert processes? In this sense, such problems are reflecting the design flaws we have in torchtitan -- in the past, these are omitted due to the usage of JobConfig everywhere. I think this is one of the good things about this refactor.

danielvegamyhre · 2026-02-18T21:23:10Z

torchtitan/components/quantization/float8.py



-register_model_converter(Float8LinearConverter, "quantize.linear.float8")
-register_model_converter(Float8GroupedMMConverter, "quantize.grouped_mm.float8")


i see this removes the converter names (quantize.grouped_mm.float8 etc) - what does the command line API for this look like now?

We lose CLI capability for adjusting this, because there is no string attached to each converter anymore.

torchtitan/components/quantization/float8.py

…write

* support launching custom trainer; * init trainer components through .build() (pytorch#2386); * move data to GPU by micro-batch; * remove rescale_accumulated_loss (pytorch#2206).

Torchtitan merged a BC-breaking config system refactor (pytorch/torchtitan#2386) that replaced TOML configs with Python dataclass configs and changed the CLI from CONFIG_FILE + --model.name to --module + --config. Updates the CI commands accordingly. Also fixes a runtime crash where aliased buffers (registered for user-facing API compat by #321) were being passed to the compiled graph, which only expects the canonical (deduplicated) set. The deepseek_v3 test is commented out as it's also disabled in torchtitan's own CI. Authored with Claude.

Torchtitan merged a BC-breaking config system refactor (pytorch/torchtitan#2386) that replaced TOML configs with Python dataclass configs and changed the CLI from CONFIG_FILE + --model.name to --module + --config. Updates the CI commands accordingly. Also fixes a runtime crash where aliased buffers (registered for user-facing API compat by #321) were being passed to the compiled graph, which only expects the canonical (deduplicated) set. The deepseek_v3 test is commented out as it's also disabled in torchtitan's own CI. Authored with Claude. stack-info: PR: #325, branch: xmfan/stack/26

* support launching custom trainer; * init trainer components through .build() (#2386); * move data to GPU by micro-batch; * remove rescale_accumulated_loss (#2206).

Torchtitan merged a BC-breaking config system refactor (pytorch/torchtitan#2386) that replaced TOML configs with Python dataclass configs and changed the CLI from CONFIG_FILE + --model.name to --module + --config. Updates the CI commands accordingly. Also fixes a runtime crash where aliased buffers (registered for user-facing API compat by #321) were being passed to the compiled graph, which only expects the canonical (deduplicated) set. The deepseek_v3 test is commented out as it's also disabled in torchtitan's own CI. Authored with Claude. stack-info: PR: #325, branch: xmfan/stack/26

tianyu-l requested review from acisseJZhong, ailzhang, felipemello1 and shuhuayu February 17, 2026 09:53

tianyu-l requested review from allenwang28, ebsmothers, fegin, joecummings, pbontrager, wconstab and wwwjn as code owners February 17, 2026 09:53

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 17, 2026

pytorch-bot bot added the ciflow/8gpu label Feb 17, 2026

tianyu-l requested a review from danielvegamyhre February 17, 2026 09:53

tianyu-l mentioned this pull request Feb 17, 2026

Torchtitan changes to integrate into Verl #2333

Merged

acisseJZhong reviewed Feb 17, 2026

View reviewed changes

scripts/generate/test_generate.py Outdated Show resolved Hide resolved

acisseJZhong reviewed Feb 17, 2026

View reviewed changes

tests/integration_tests/features.py Show resolved Hide resolved

acisseJZhong reviewed Feb 17, 2026

View reviewed changes

tests/unit_tests/test_dataset_checkpointing.py Outdated Show resolved Hide resolved

acisseJZhong reviewed Feb 17, 2026

View reviewed changes

tests/unit_tests/test_dataset_checkpointing.py Outdated Show resolved Hide resolved

acisseJZhong reviewed Feb 18, 2026

View reviewed changes

torchtitan/components/optimizer.py Outdated Show resolved Hide resolved

acisseJZhong reviewed Feb 18, 2026

View reviewed changes

torchtitan/distributed/dual_pipe_v.py Outdated Show resolved Hide resolved

acisseJZhong reviewed Feb 18, 2026

View reviewed changes

torchtitan/models/common/attention.py Outdated Show resolved Hide resolved

acisseJZhong reviewed Feb 18, 2026

View reviewed changes

torchtitan/trainer.py Show resolved Hide resolved

wwwjn reviewed Feb 18, 2026

View reviewed changes

fegin reviewed Feb 18, 2026

View reviewed changes

danielvegamyhre reviewed Feb 18, 2026

View reviewed changes

tianyu-l added 13 commits February 23, 2026 10:28

[BC breaking][refactor][5/n] ModelConverter

85e3142

[BC breaking][refactor][6/n]parallelism

a0eb5d8

[BC breaking][refactor][7/n] ModelConverters, more MetricProcessor

ba2c68c

[BC breaking][refactor][8/n]model code

b14bdbf

[BC breaking][refactor][9/n]trainer

9a8ae5f

[BC breaking][refactor][10/n] TrainSpec -> ModelSpec, config entry re…

2fc04c6

…write

[BC breaking][refactor][11/n] Transformer and TransformerBlock refactor

895b929

[BC breaking][refactor][12/n] refactor FaultTolerantTrainer and Flux

83eb625

[BC breaking][refactor][13/n] remove existing _merge_configs pattern

21fc249

[BC breaking][refactor][14/n] fix unit tests and integration tests

69a3add

[BC breaking][refactor][15/n] update .md docs

50399ff

[BC breaking][refactor][16/n] fix CI and address reviewer comments

6656c68

[BC breaking][refactor][17/n] pyrefly suppress

9db0650

tianyu-l force-pushed the config branch from ae3bea9 to 9db0650 Compare February 23, 2026 18:29

This was referenced Feb 23, 2026

[BE] Apply pep585 and turn on flake8 rule so no regressions #2411

Merged

[ROCm] Pin PyTorch ROCm wheels to 20260218 #2425

Merged

tianyu-l merged commit 9810191 into main Feb 23, 2026
29 of 44 checks passed

github-project-automation bot moved this from In Progress to Done in 26H1 TorchTitan Development Feb 23, 2026

Lucaskabela mentioned this pull request Feb 23, 2026

[RL] Fix rl on main #2427

Open

tianyu-l mentioned this pull request Feb 24, 2026

[Module] Add configurable Embedding with init_weights support #2428

Draft

hann-wang mentioned this pull request Feb 24, 2026

[ForgeEngine] fix broken ForgeEngine: #2429

Merged

joecummings mentioned this pull request Feb 24, 2026

[rl] Use torchtitan config system for inference and simple GRPO #2191

Open

xmfan mentioned this pull request Feb 24, 2026

Fix torchtitan CI after TOML-to-Python config refactor meta-pytorch/autoparallel#325

Merged

tianyu-l pushed a commit that referenced this pull request Feb 25, 2026

[ForgeEngine] fix broken ForgeEngine: (#2429)

a4ca5a4

* support launching custom trainer; * init trainer components through .build() (#2386); * move data to GPU by micro-batch; * remove rescale_accumulated_loss (#2206).



		register_model_converter(Float8LinearConverter, "quantize.linear.float8")
		register_model_converter(Float8GroupedMMConverter, "quantize.grouped_mm.float8")

Conversation

tianyu-l commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

author's note

claude-generated summary

Summary

Motivation

What Changed

1. Configurable Base Class

2. Trainer.Config Replaces JobConfig

3. config_registry.py Replaces TOML Files

4. TrainSpec -> ModelSpec

5. Model Configs: Flat ModelArgs -> Nested Dataclass Hierarchy

6. train.py Split

7. Experiment Extension via Inheritance

UX Comparison

Launching Training

CLI Overrides

Defining a New Model Config

Adding Experiment-Specific Config Fields

Float8 / Quantization Configuration

Limitations and Trade-offs

1. Configs are no longer declarative text files

2. Steeper learning curve for contributors

3. Config serialization is more complex

4. tyro dependency

5. @dataclass(slots=True) constraints

6. Two-level indirection for model selection

Numerics Verification

Migration Guide

Uh oh!

wconstab commented Feb 17, 2026

Uh oh!

tianyu-l commented Feb 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acisseJZhong Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fegin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fegin Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tianyu-l commented Feb 17, 2026 •

edited

Loading

1. `Configurable` Base Class

2. `Trainer.Config` Replaces `JobConfig`

3. `config_registry.py` Replaces TOML Files

4. `TrainSpec` -> `ModelSpec`

5. Model Configs: Flat `ModelArgs` -> Nested Dataclass Hierarchy

6. `train.py` Split

5. `@dataclass(slots=True)` constraints

acisseJZhong Feb 19, 2026 •

edited

Loading

fegin left a comment •

edited

Loading

fegin Feb 18, 2026 •

edited

Loading