[Module] Add configurable RMSNorm that inherits from Module by fegin · Pull Request #2434 · pytorch/torchtitan

fegin · 2026-02-24T21:27:55Z

Stack from ghstack (oldest at bottom):

BC Breaking
This is a breaking change for any downstream code that passes norm_eps directly to model configs.

Why the Change
Same as #2428. This PR changes RMSNorm to inherit from Module.

Summary
Introduces RMSNorm, a configurable wrapper around nn.RMSNorm that inherits from the Module. This enables RMSNorm to participate in the Config.build() pattern and provides a standardized init_weights() method.

All model families (Llama3, Llama4, Qwen3, DeepSeekV3, GptOss, Flux) are updated to:

Use RMSNorm.Config fields in their dataclass configs instead of bare norm_eps: float
Build norms via config.build(normalized_shape=dim) instead of nn.RMSNorm(dim, eps=...)
Call norm.init_weights() instead of norm.reset_parameters()

The norm_eps field is removed from TransformerBlock.Config and Decoder.Config since the eps value is now encapsulated in RMSNorm.Config.

[ghstack-poisoned]

** BC Breaking ** This is a breaking change for any downstream code that passes `norm_eps` directly to model configs. ** Why the Change ** Same as #2428. This PR changes RMSNorm to inherit from Module. ** Summary ** Introduces RMSNorm, a configurable wrapper around nn.RMSNorm that inherits from the Module. This enables RMSNorm to participate in the Config.build() pattern and provides a standardized init_weights() method. All model families (Llama3, Llama4, Qwen3, DeepSeekV3, GptOss, Flux) are updated to: - Use RMSNorm.Config fields in their dataclass configs instead of bare norm_eps: float - Build norms via config.build(normalized_shape=dim) instead of nn.RMSNorm(dim, eps=...) - Call norm.init_weights() instead of norm.reset_parameters() The norm_eps field is removed from TransformerBlock.Config and Decoder.Config since the eps value is now encapsulated in RMSNorm.Config. ghstack-source-id: bf4d8ec Pull-Request: #2434

tianyu-l · 2026-02-24T21:42:51Z

torchtitan/models/qwen3/__init__.py

+# Shared config object: safe because RMSNorm.Config is an immutable-style
+# dataclass (slots=True, no mutable fields).  If mutable fields are ever
+# added, each model variant should get its own instance instead.
+norm_config = RMSNorm.Config(eps=1e-6)


question 1: why only qwen3 is doing this shared norm config, not other models?
question 2: if it's shared by multiple nodes in the config tree including the root node, why don't we let root node pass the config to children, just like hidden dimension / vocab_size / etc.

tianyu-l · 2026-02-24T21:43:24Z

torchtitan/models/common/decoder.py

+        attention_norm: RMSNorm.Config = field(default_factory=RMSNorm.Config)
+        ffn_norm: RMSNorm.Config = field(default_factory=RMSNorm.Config)


don't need default_factory here?

[ghstack-poisoned]

** BC Breaking ** This is a breaking change for any downstream code that passes `norm_eps` directly to model configs. ** Why the Change ** Same as #2428. This PR changes RMSNorm to inherit from Module. ** Summary ** Introduces RMSNorm, a configurable wrapper around nn.RMSNorm that inherits from the Module. This enables RMSNorm to participate in the Config.build() pattern and provides a standardized init_weights() method. All model families (Llama3, Llama4, Qwen3, DeepSeekV3, GptOss, Flux) are updated to: - Use RMSNorm.Config fields in their dataclass configs instead of bare norm_eps: float - Build norms via config.build(normalized_shape=dim) instead of nn.RMSNorm(dim, eps=...) - Call norm.init_weights() instead of norm.reset_parameters() The norm_eps field is removed from TransformerBlock.Config and Decoder.Config since the eps value is now encapsulated in RMSNorm.Config. ghstack-source-id: 207cf6a Pull-Request: #2434

Update

601b6f7

[ghstack-poisoned]

fegin requested review from tianyu-l, wconstab and wwwjn as code owners February 24, 2026 21:27

pytorch-bot bot added the ciflow/8gpu label Feb 24, 2026

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 24, 2026

fegin mentioned this pull request Feb 24, 2026

[Module] Add configurable Embedding with init_weights support #2428

Draft

fegin marked this pull request as draft February 24, 2026 21:28

tianyu-l reviewed Feb 24, 2026

View reviewed changes

Update

65614be

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Module] Add configurable RMSNorm that inherits from Module#2434

[Module] Add configurable RMSNorm that inherits from Module#2434
fegin wants to merge 2 commits intogh/fegin/82/basefrom
gh/fegin/82/head

fegin commented Feb 24, 2026 •

edited

Loading

Uh oh!

tianyu-l Feb 24, 2026

Uh oh!

tianyu-l Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		attention_norm: RMSNorm.Config = field(default_factory=RMSNorm.Config)
		ffn_norm: RMSNorm.Config = field(default_factory=RMSNorm.Config)

Conversation

fegin commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fegin commented Feb 24, 2026 •

edited

Loading