-
Notifications
You must be signed in to change notification settings - Fork 3k
[rollout] feat: use rollout and validate parallel process #4863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[rollout] feat: use rollout and validate parallel process #4863
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces parallel processing for validation in the FullyAsyncRollouter by using a ThreadPoolExecutor. This allows validation to run asynchronously, controlled by the parallel_validate_and_rollout configuration. The changes refactor the validation logic to support both synchronous and asynchronous execution paths. My review identifies a critical issue with unhandled exceptions in a fire-and-forget asyncio task, which could lead to silent failures. I have also pointed out several high-severity issues related to Python best practices, such as placing imports inside methods, accessing private attributes, using non-idiomatic boolean comparisons, and including non-English comments, all of which impact code quality and maintainability.
Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>
What does this PR do?
validate process and rollout process can be paralled in fully-async mode. This can reduce rollouter ilde and improve timing_s/gen speed when in short-long response_length case.
This is below profiler: (qwen2.5-Math-7b)
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
we can use
dapo_7b_math_fsdp2_8_8.sh, and add+async_training.parallel_validate_and_rollout=TruecommandAPI and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.