[NOT4REVIEW] Make RL model compilable: ops rewrite + per-sample backward by Lucaskabela · Pull Request #2394 · pytorch/torchtitan

Lucaskabela · 2026-02-18T23:54:06Z

Rewrites batch-invariant ops as torch.library.custom_op (rms_norm, silu_and_mul, flash_attn) so they are opaque to Dynamo/AOT autograd. Adds aten dispatch overrides for matmul/linear backward to use vLLM's deterministic kernels.

Refactors compute_policy_gradient_loss_vllm to use per-sample gradient accumulation: each sample's forward is immediately followed by backward, keeping only one set of activations in memory at a time. This is a prerequisite for torch.compile since the compiled graph processes one sample at a time with fixed-shape inputs.

Changes:

batch_invariant_backward.py: custom ops rewrite
models/attention.py: custom op for flash_attn
simple_rl.py: per-sample backward, loss_scale param, timing metrics
trainer.py: move zero_grad before loss, remove loss.backward()

Authored with Claude

Rewrites batch-invariant ops as torch.library.custom_op (rms_norm, silu_and_mul, flash_attn) so they are opaque to Dynamo/AOT autograd. Adds aten dispatch overrides for matmul/linear backward to use vLLM's deterministic kernels. Refactors compute_policy_gradient_loss_vllm to use per-sample gradient accumulation: each sample's forward is immediately followed by backward, keeping only one set of activations in memory at a time. This is a prerequisite for torch.compile since the compiled graph processes one sample at a time with fixed-shape inputs. Changes: - batch_invariant_backward.py: custom ops rewrite - models/attention.py: custom op for flash_attn - simple_rl.py: per-sample backward, loss_scale param, timing metrics - trainer.py: move zero_grad before loss, remove loss.backward()

pytorch-bot bot added the ciflow/8gpu label Feb 18, 2026

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 18, 2026

Lucaskabela force-pushed the lucaskabela/rl-compilable branch 2 times, most recently from 57ffef9 to e10c3b7 Compare February 19, 2026 19:17

Lucaskabela added 3 commits February 19, 2026 16:01

Orchestrate runtime and memory profile measurement

07acc32

Refactor gpu_timer into rl folder

04428a5

Lucaskabela force-pushed the lucaskabela/rl-compilable branch from e10c3b7 to 575a33d Compare February 23, 2026 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOT4REVIEW] Make RL model compilable: ops rewrite + per-sample backward#2394

[NOT4REVIEW] Make RL model compilable: ops rewrite + per-sample backward#2394
Lucaskabela wants to merge 3 commits intopytorch:mainfrom
Lucaskabela:lucaskabela/rl-compilable

Lucaskabela commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lucaskabela commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant