[vllm, rollout] fix: Add TP-aware weight loading for async rollout with different parallelism configs #4876

jreiml · 2026-01-10T18:19:07Z

What does this PR do?

Adds automatic tensor parallel (TP) resharding support when training and inference use different TP configurations. This fixes weight loading failures in fully async workflows where Megatron exports full HuggingFace weights but vLLM expects TP-sharded weights.

Related to #400, #1063, #4497

Problem

When using fully async training with:

Megatron training: tensor_model_parallel_size=8, use_mbridge=True
vLLM inference: tensor_model_parallel_size=32

Weight synchronization fails because:

mbridge exports full (unsharded) HuggingFace-format weights
vLLM parameters expect weights pre-sharded for TP=32

Parameters without a weight_loader attribute cause assertion errors:

AssertionError: param.shape [hidden_size/32, ...] != loaded_weight.shape [hidden_size, ...]

This affects large MoE models (DeepSeek-V3/R1) where training requires pipeline parallelism but inference benefits from higher TP across nodes.

Solution

Add VERL_ENABLE_TP_RESHARD environment variable (opt-in to avoid breaking existing setups)
Patch parameters without weight_loader to use a TP-aware loader that:
- Detects which dimension differs by a factor of tp_size
- Automatically shards along that dimension based on tp_rank
Refactor patch_vllm_moe_model_weight_loader to handle both MoE expert patching and general TP resharding

Test

Validated on DeepSeek-V3-Base with:

16 nodes Megatron training (TP=8, PP=16, EP=8)
16 nodes vLLM inference (TP=32, EP=32)
Fully async DAPO training with use_mbridge=True

Before fix: Weight loading fails with shape mismatch assertions
After fix: Weights load correctly, training proceeds normally

Config used

parallelism:
  tp: 8
  pp: 16
  ep: 8
  etp: 1
  infer_tp: 32
  infer_ep: 32
  infer_dp: 1

Launch command

VERL_ENABLE_TP_RESHARD=1 \
VLLM_USE_DEEP_GEMM=1 \
VLLM_ALL2ALL_BACKEND=deepep_high_throughput \
python3 -m ta_verl.recipe.fully_async_dapo.fully_async_main \
  --config-name="deepseek_v3_20251219.yaml" \
  trainer.nnodes="16" \
  rollout.nnodes="16"

API and Usage Example

# Enable TP resharding for async rollout
VERL_ENABLE_TP_RESHARD=1 python -m verl.trainer.main ...

No config changes required. The fix automatically detects and handles TP mismatches.

Design & Code Changes

File: verl/utils/vllm/patch.py

_get_tp_rank_and_size(): Helper to get vLLM's tensor parallel configuration
_create_tp_aware_weight_loader(): Creates weight loaders that:
- Pass through if shapes match
- Auto-shard if loaded weight is tp_size times larger in one dimension
patch_vllm_moe_model_weight_loader(): Extended to patch all parameters without weight_loader when VERL_ENABLE_TP_RESHARD=1

Checklist Before Submitting

Read the Contribute Guide
Apply pre-commit checks
Add/Update documentation
Add unit tests (manual validation on DeepSeek-V3 scale)

Related Issues

Issue	Link	Relevance
#4497	#4497	MoE weight format mismatch (same root cause)
#708	#708	DeepSeek R1 infrastructure project
#400	#400	TP rollout + FSDP / TP actor feature request
#1063	#1063	RFC auto resharding design

gemini-code-assist

Code Review

This pull request introduces a valuable feature for automatic tensor-parallel resharding of weights, which is crucial for flexible deployment scenarios where training and inference parallelism configurations differ. The implementation is clean, using an environment variable for opt-in, and refactors the existing patching logic for better clarity. The core logic for detecting the sharding dimension and applying it seems correct. I have one suggestion to improve the robustness of the error handling within the new TP-aware weight loader.

gemini-code-assist · 2026-01-10T18:20:33Z

verl/utils/vllm/patch.py

+        if shard_dim is None:
+            # Can't determine sharding, fall back to assertion (will fail with clear error)
+            assert param.shape == loaded_weight.shape, (
+                f"Cannot determine sharding strategy for {param_name}. "
+                f"Loaded weight shape {loaded_weight.shape} does not match "
+                f"parameter shape {param.shape} and is not a simple TP multiple."
+            )
+            param.data.copy_(loaded_weight)
+            return


Using assert for error handling in this case is not fully robust, as assertions can be disabled with the -O Python flag. If disabled, the param.data.copy_(loaded_weight) line would execute and likely raise a less informative RuntimeError due to a shape mismatch, as this code path is only taken when shapes are already known to be different.

To ensure this critical error is always caught and clearly reported, it's better to explicitly raise a ValueError. This makes the code more robust and its intent clearer. The lines following the assert are also unreachable if assertions are enabled, so replacing the block simplifies the code.

Suggested change

if shard_dim is None:

# Can't determine sharding, fall back to assertion (will fail with clear error)

assert param.shape == loaded_weight.shape, (

f"Cannot determine sharding strategy for {param_name}. "

f"Loaded weight shape {loaded_weight.shape} does not match "

f"parameter shape {param.shape} and is not a simple TP multiple."

)

param.data.copy_(loaded_weight)

return

if shard_dim is None:

# Can't determine sharding, so raise an error with a clear message.

raise ValueError(

f"Cannot determine sharding strategy for {param_name}. "

f"Loaded weight shape {loaded_weight.shape} does not match "

f"parameter shape {param.shape} and is not a simple TP multiple."

)

fix: tp sharding for vllm

2fecf43

jreiml marked this pull request as draft January 10, 2026 18:20

gemini-code-assist bot reviewed Jan 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vllm, rollout] fix: Add TP-aware weight loading for async rollout with different parallelism configs #4876

[vllm, rollout] fix: Add TP-aware weight loading for async rollout with different parallelism configs #4876

jreiml commented Jan 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[vllm, rollout] fix: Add TP-aware weight loading for async rollout with different parallelism configs #4876

Are you sure you want to change the base?

[vllm, rollout] fix: Add TP-aware weight loading for async rollout with different parallelism configs #4876

Conversation

jreiml commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Problem

Solution

Test

Config used

Launch command

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Related Issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jreiml commented Jan 10, 2026 •

edited

Loading