[rl] refactor save and load model weights using DCP#2221

Open

wwwjn wants to merge 21 commits intogh/wwwjn/6/basefrom

gh/wwwjn/6/head

Contributor

wwwjn commented Jan 13, 2026 •

edited

Loading

Stack from ghstack (oldest at bottom):

What's new in this PR

Directly passing weights as tensor (plain tensor) from trainer to generator.
Remove the burden of writing and reading from files.
Supported test: Trainer supports DDP, generator only supports TP=1 (no DTensor in both sides yet)


          refactor save and load model weights using DCP

c810835

[ghstack-poisoned]

This was referenced Jan 13, 2026

[rl] Use torchtitan config system for inference and simple GRPO #2191

Open

[rl] refactor torchtitan model registery in vllm #2194

Open

pytorch-bot bot added the ciflow/8gpu label

meta-cla bot added the CLA Signed label

wwwjn added a commit that referenced this pull request


          refactor save and load model weights using DCP

8c96c73

ghstack-source-id: bcd9f5e
Pull Request resolved: #2221

wwwjn changed the title ~~refactor save and load model weights using DCP~~ [WIP] refactor save and load model weights using DCP


          Update on "[WIP] refactor save and load model weights using DCP"

a8ecb76

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request


          refactor save and load model weights using DCP

ab0872a

ghstack-source-id: b7642b4
Pull Request resolved: #2221


          Update on "[WIP] refactor save and load model weights using DCP"

0606af1

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request


          refactor save and load model weights using DCP

9bf2b83

ghstack-source-id: b7642b4
Pull Request resolved: #2221


          Update on "[WIP] refactor save and load model weights using DCP"

a32eb5c

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request


          refactor save and load model weights using DCP

0209b2b

ghstack-source-id: 87a29dc
Pull Request resolved: #2221

wwwjn changed the title ~~[WIP] refactor save and load model weights using DCP~~ [rl] refactor save and load model weights using DCP

wwwjn requested review from acisseJZhong, tianyu-l and zhxchen17

January 14, 2026 05:53

acisseJZhong reviewed

View reviewed changes

torchtitan/experiments/rl/unified/actors/generator.py Show resolved Hide resolved

acisseJZhong reviewed

View reviewed changes

torchtitan/experiments/rl/unified/models/vllm_wrapper.py Show resolved Hide resolved

Contributor

fegin commented Jan 14, 2026

I'm wondering that should we refactor TorchTitan checkpointer so that it can be directly used in this case. While the current PR work, if TorchTitan migrates to a new checkpoint library other use cases need the same updates as well. This is more future work, not blocking this PR.

wwwjn mentioned this pull request

[rl] Generator enables TP, using torchtitan as Trainer, add grader for reward calculation #2244

Open

tianyu-l reviewed

View reviewed changes

torchtitan/experiments/rl/unified/models/vllm_wrapper.py Outdated Show resolved Hide resolved

torchtitan/experiments/rl/unified/actors/generator.py Outdated Show resolved Hide resolved

wwwjn added 3 commits

January 30, 2026 00:23


          Update on "[rl] refactor save and load model weights using DCP"

22b624a

[ghstack-poisoned]


          Update on "[rl] refactor save and load model weights using DCP"

9e9738f

[ghstack-poisoned]


          Update on "[rl] refactor save and load model weights using DCP"

c97bfac

[ghstack-poisoned]

wwwjn commented

View reviewed changes

torchtitan/experiments/rl/unified/actors/generator.py Outdated Show resolved Hide resolved

torchtitan/experiments/rl/unified/simple_grpo.py Show resolved Hide resolved

wwwjn added 2 commits

January 30, 2026 14:26


          Update on "[rl] refactor save and load model weights using DCP"

86aeea8

[ghstack-poisoned]


          Update on "[rl] refactor save and load model weights using DCP"

e85e699

[ghstack-poisoned]

wwwjn changed the title ~~[rl] refactor save and load model weights using DCP~~ [WIP][rl] refactor save and load model weights using DCP


          Update on "[WIP][rl] refactor save and load model weights using DCP"

a9ff125

[ghstack-poisoned]

wwwjn changed the title ~~[WIP][rl] refactor save and load model weights using DCP~~ [rl] refactor save and load model weights using DCP

wwwjn mentioned this pull request

[WIP][rl] enable batch-invariant mode in RL loop #2395

Open

acisseJZhong reviewed

View reviewed changes

torchtitan/experiments/rl/unified/models/vllm_wrapper.py Show resolved Hide resolved

acisseJZhong reviewed

View reviewed changes

torchtitan/experiments/rl/unified/actors/generator.py Outdated Show resolved Hide resolved

tianyu-l reviewed

View reviewed changes

torchtitan/experiments/rl/unified/models/vllm_wrapper.py

                           job_config=self.parallel_config,
                       )
+                      # Initial load model weights from HuggingFace checkpoint path

Contributor

tianyu-l Feb 19, 2026

Do we really need this complication?
IIUC we can do

trainer loads checkpoint (can be HF one thanks to torchtitan sd adapters)
generator only get weigts from trainer, even the initial one
generator never needs to know HF checkpoints or worry about the conversion

Contributor Author

wwwjn Feb 25, 2026 •

edited

Loading

This is to make sure when it maintains the same expectation whenever / wherever you initialize a LLMEngine(model_path=<checkpointer_folder>, hf_override={"model": "TorchtitanVLLMModel"}). My "expectation" to this call is the underlaying model is initialized with the weights from checkpoint folder.

vllm achieve this "expectation" by calling TorchtitanVLLMWrapper.load_weights() when initializing the LLMEngine.
We achieve this "expectation" by: 1) make TorchtitanVLLMWrapper.load_weights() a no-op, and 2) Call _initial_load_weights in Wrapper's __init__.

This "expectation" would help inference as well.

The way you described would work perfectly in RL, but people might implicitly ignore the model weights are not initialized during inference. wdyt?

torchtitan/experiments/rl/unified/models/vllm_wrapper.py


		return self.load_weights_from_state_dict(torchtitan_state_dict)

		def load_weights(self, weights_iter):

Contributor

tianyu-l Feb 19, 2026

Do we still need this function? Can it be deleted or overridden with "NotImplementedError"?

Contributor Author

wwwjn Feb 25, 2026

This function will still be called internally by vLLM


          Update

b70d363

[ghstack-poisoned]

wwwjn mentioned this pull request

[rl] bypass reload_weights by manually copy weights per-param, fix weight tying #2410

Merged


          Update

e916288

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request


          [rl] bypass reload_weights by manually copy weights per-param, fix we…

bc9f3ff

…ight tying (#2410)

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.13.0)
(oldest at bottom):
* #2395
* #2244
* #2221
* #2194
* #2191
* __->__ #2410


This is a alternative fix to
#2402 (comment).

Weight updating between trainer and generator is totally broken because:
It's caused by we called "reload_weights" when updating the weights. The
reload_weights has following steps:

- initialize_layerwise_reload(model): Saves the current real GPU tensors
as info.kernel_tensors, and replace all parameters with meta tensor.
- Call model.load_weights(weights_iter): This function is written by us
and calls set_model_state_dict, Internally, set_model_state_dict tries
to do param.data.copy_(loaded_weight) for each parameter. When
parameters are meta tensor, it will do "no-op". So the weights never get
updated


In this PR:

- Totally bypass reload_weights, and don't load from a file when we
update the weights
- Gets the model via
self.engine.model_executor.driver_worker.get_model()
- Iterates over model.named_parameters() to find the matching parameter
by name
- Does param.data.copy_(new_tensor) directly

wwwjn added 9 commits

February 22, 2026 16:48


          Update

4b2ad3b

[ghstack-poisoned]


          Update

ea24d9d

[ghstack-poisoned]


          Update

28ce7a2

[ghstack-poisoned]


          Update

1b230bb

[ghstack-poisoned]


          Update

[ghstack-poisoned]


          Update

cf6cd0b

[ghstack-poisoned]


          Update

cf8cc56

[ghstack-poisoned]


          Update

60e0f92

[ghstack-poisoned]


          Update

9924e62

[ghstack-poisoned]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed