Skip to content

[rl] Using JobConfig as the centralized config system for inference and simple GRPO#2188

Closed
wwwjn wants to merge 3 commits intomainfrom
vllm-config
Closed

[rl] Using JobConfig as the centralized config system for inference and simple GRPO#2188
wwwjn wants to merge 3 commits intomainfrom
vllm-config

Conversation

@wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Dec 31, 2025

  1. Add job_config.py to extend current JobConfig. Now an issue is trainer's config and generator's config are not symmetric, eg Parallelism and Generation.parallelism
  2. Use job config system as the centralized / source-of-truth config, loading config from run_configs/qwen3_0.6b.toml file.
  3. Refactor the generator to use EngineArgs() and LLMEngine(), instead of LLM()
  4. Rename simple_rl_multiprocess -> simple_grpo to be more descriptive
  5. Clean up unused code branch

Test: (trainer ddp = 2, n_generator =1)
Screenshot 2025-12-30 at 7 34 00 PM

Following-up refactors:

  • Refactor2: vllm model register - using setup.py and plugin instead of import
  • Refactor3: Weight updater, by directly passing state_dict (DTensor) between trainer and generator
  • Refactor4: Use torchtitan Trainer, modularize each component

@wwwjn wwwjn requested a review from acisseJZhong December 31, 2025 03:40
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 31, 2025
@wwwjn wwwjn requested review from tianyu-l and zhxchen17 December 31, 2025 03:40


@dataclass
class Generate:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need a better name for this field, as well as RL

@wwwjn
Copy link
Contributor Author

wwwjn commented Jan 2, 2026

Close with #2191

@wwwjn wwwjn closed this Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant