Enable torch.compile and CUDA graphs for vLLM inference#4
Draft
Lucaskabela wants to merge 1 commit intolucaskabela/compile_logic_changesfrom
Draft
Enable torch.compile and CUDA graphs for vLLM inference#4Lucaskabela wants to merge 1 commit intolucaskabela/compile_logic_changesfrom
Lucaskabela wants to merge 1 commit intolucaskabela/compile_logic_changesfrom
Conversation
- Add build_compilation_config() and get_cudagraph_mode() to parallelism_utils; callers derive cudagraph_mode from tp_size via the helper and pass the string to build_compilation_config - Add @support_torch_compile decorator to TorchTitanVLLMModelWrapper - Add vllm_compile_and_cudagraph flag to Generator and VLLMRolloutEngine in both unified and vllm_compat paths - Add --disable-vllm-compile-and-cudagraph CLI arg to infer.py - Wire compilation_config and enforce_eager through all LLM instantiation sites
88de8a8 to
6612d3e
Compare
69fc2e0 to
054ae5e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
We now enable using
support_torch_compilein the vllm wrapper definition in order to improve our end to end training runtime.The particular changes we do in this PR are:
Test Plan
Execute
Baseline (losses/rewards)
This PR (losses/rewards)
These match exactly showing we preserve training stability
Metric comparisson
On top of pytorch#2398 we get the following metrics:
of note - runtime improves significantly, cutting rollout time by almost 5x. There is no significant memory usage
Authored with claude