[WIP][rl] enable batch-invariant mode in RL loop by wwwjn · Pull Request #2395 · pytorch/torchtitan

wwwjn · 2026-02-19T03:29:03Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

ghstack-source-id: ae5abed Pull-Request: #2395

acisseJZhong · 2026-02-19T06:22:07Z

torchtitan/experiments/rl/unified/actors/trainer.py

+        # Batch invariant mode: set NCCL determinism env vars
+        policy_opt = job_config.policy_optimization
+        if policy_opt.batch_invariant_mode:
+            _set_nccl_determinism_envs()


can we move this to line 125? also curious does generator need to set this?

[ghstack-poisoned]

ghstack-source-id: bcacf2b Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 585ee6c Pull-Request: #2395

…ight tying (#2410) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.13.0) (oldest at bottom): * #2395 * #2244 * #2221 * #2194 * #2191 * __->__ #2410 This is a alternative fix to #2402 (comment). Weight updating between trainer and generator is totally broken because: It's caused by we called "reload_weights" when updating the weights. The reload_weights has following steps: - initialize_layerwise_reload(model): Saves the current real GPU tensors as info.kernel_tensors, and replace all parameters with meta tensor. - Call model.load_weights(weights_iter): This function is written by us and calls set_model_state_dict, Internally, set_model_state_dict tries to do param.data.copy_(loaded_weight) for each parameter. When parameters are meta tensor, it will do "no-op". So the weights never get updated In this PR: - Totally bypass reload_weights, and don't load from a file when we update the weights - Gets the model via self.engine.model_executor.driver_worker.get_model() - Iterates over model.named_parameters() to find the matching parameter by name - Does param.data.copy_(new_tensor) directly

[ghstack-poisoned]

ghstack-source-id: d50af97 Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 1de5a75 Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 4778303 Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: e49afbe Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 05604f4 Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 51cc080 Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 1f2c104 Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 6ef380d Pull-Request: #2395

[ghstack-poisoned]

ghstack-source-id: 5b576bd Pull-Request: #2395

Update

a7bbfbb

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 19, 2026

enable batch-invariant mode in RL loop

96f3544

ghstack-source-id: ae5abed Pull-Request: #2395

pytorch-bot bot added the ciflow/8gpu label Feb 19, 2026

wwwjn mentioned this pull request Feb 19, 2026

[rl] Use torchtitan config system for inference and simple GRPO #2191

Open

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 19, 2026

This was referenced Feb 14, 2026

[rl] refactor torchtitan model registery in vllm #2194

Open

[rl] refactor save and load model weights using DCP #2221

Open

[rl] Generator enables TP, using torchtitan as Trainer, add grader for reward calculation #2244

Open

wwwjn changed the title ~~enable batch-invariant mode in RL loop~~ [WIP][rl] enable batch-invariant mode in RL loop Feb 19, 2026

acisseJZhong reviewed Feb 19, 2026

View reviewed changes

Update

0bd3adf

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 20, 2026

enable batch-invariant mode in RL loop

a9699bc

ghstack-source-id: bcacf2b Pull-Request: #2395

wwwjn mentioned this pull request Feb 20, 2026

[rl] bypass reload_weights by manually copy weights per-param, fix weight tying #2410

Merged

Update

150fb72

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 21, 2026

enable batch-invariant mode in RL loop

4f1fb95

ghstack-source-id: 585ee6c Pull-Request: #2395

Update

1016c3c

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 23, 2026

enable batch-invariant mode in RL loop

907d5ba

ghstack-source-id: d50af97 Pull-Request: #2395

Update

baf4d44

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 23, 2026

enable batch-invariant mode in RL loop

6514ffc

ghstack-source-id: 1de5a75 Pull-Request: #2395

Update

8363c43

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 23, 2026

enable batch-invariant mode in RL loop

7bf33c2

ghstack-source-id: 4778303 Pull-Request: #2395

Update

cbd8c6a

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 24, 2026

enable batch-invariant mode in RL loop

24fa260

ghstack-source-id: e49afbe Pull-Request: #2395

Update

8013485

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 24, 2026

enable batch-invariant mode in RL loop

5bd299e

ghstack-source-id: 05604f4 Pull-Request: #2395

Update

8d38aa0

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 24, 2026

enable batch-invariant mode in RL loop

5c547c7

ghstack-source-id: 51cc080 Pull-Request: #2395

Update

18aa282

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 24, 2026

enable batch-invariant mode in RL loop

4fbe9ea

ghstack-source-id: 1f2c104 Pull-Request: #2395

Update

64e0479

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 24, 2026

enable batch-invariant mode in RL loop

455058b

ghstack-source-id: 6ef380d Pull-Request: #2395

Update

0acc84d

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Feb 25, 2026

enable batch-invariant mode in RL loop

b97551f

ghstack-source-id: 5b576bd Pull-Request: #2395

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][rl] enable batch-invariant mode in RL loop#2395

[WIP][rl] enable batch-invariant mode in RL loop#2395
wwwjn wants to merge 12 commits intogh/wwwjn/12/basefrom
gh/wwwjn/12/head

wwwjn commented Feb 19, 2026 •

edited

Loading

Uh oh!

acisseJZhong Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wwwjn commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acisseJZhong Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wwwjn commented Feb 19, 2026 •

edited

Loading