Releases: alibaba/ROLL
Releases · alibaba/ROLL
v0.2.0 release
Hello everyone! Thank you for your attention to ROLL.
ROLL has recently updated with a large number of new features. Below is a summary of recent updates, and we will continue to iterate and update ROLL. Welcome to join the ROLL community.
🚀 Highlights:
- New model support: Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni, GLM-4.7
- Agentic training and Rollout GPU partial overlap, switching idle training GPUs to Rollout
- DynamicSamplingScheduler coroutine refactoring
- New: FSDP2 Strategy
- Training supports Sequence packing and Dynamic batching
🚀 Major New Features:
- Rollout
- DynamicSamplingScheduler coroutine refactoring
- Custom rollout pre/post process, supporting dynamic sampling params, multi-stage generation, ThinkingBudget control
- Sglang: Strategy refactoring, supporting server mode, native onload/offload, inflight FP8 quant rollout, cross-machine multi-node deployment
- vLLM: DP/EP support, supports vllm==0.12.0
- Provides AgentNative Rollout paradigm, AgentNativeStepEnvManager + SokobanNativeEnv, fully managed context by env
- Async Rollout Hang Detect: Added asynchronous Rollout hang detection to quickly locate problematic envs
- Supports rollout dump & mock, improving forward/train phase precision alignment efficiency
- Agentic pipeline supports train-val/rollout overlap
- Training
- FSDP2
- Megatron support LoRA, LoRA RL blogs: https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- Save model parameters in HF format online during Megatron training
- Support FP8 training for Megatron Strategy
- Sequence packing, fine-tuned loss_func interface definition
- Dynamic batching
- Add DeepSpeed SFT support
- Model Update implementation optimization: Eliminate inter-machine redundancy, weight conversion and nccl broadcast overlap, optimize host to device, adjust multiple pp serial synchronization to lock mode for simultaneous synchronization
- Asynchronous Feature
- Training and Rollout GPU partial overlap, switching idle training GPUs to Rollout, report: https://arxiv.org/abs/2512.24873
- Agentic off policy loss with IS correction
- Pipeline recipe
- VLM image tool use: DeepEyes, tool invocation and reward calculation overlap
- Models: New model support for Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni-Thinker, GLM-4.7
release flag for v0.1.3
🚀亮点:
- (feat): support Qwen3VL, mcore_adapter and examples.
- (feat): Add optimization for computing ref_logprobs and old_logprobs.
- (feat): support vllm beam_search.
- (feat): Add support for Qwen-3-next on AMD GPUs.
- (feat): support sglang==0.5.4、vllm==0.11.1、torch2.8.0.
🚀主要新特性:
- Agentic
- (fix): fix agentic val get_batch state in redundancy env.
- (feat): agentic-spec actor worker.
- (feat): add infer_log_probs in agentic.
- (feat): refactor agentic norm like LitePPO.
- (feat): add agentic profile metrics.
- 模型与后端
- (feat): support vllm beam_search.
- (feat): Add support for Qwen-3-next on AMD GPUs.
- (feat): support offload nccl to save gpu memory. Thanks for slime.
- (feat): support sglang 054.
- (feat): sglang support dp-attention.
- (feat): add enable_reference option. #250
- (feat): add enable_old_logprobs, opt old log probs by cache.
- (feat): support Qwen3VL, mcore_adapter and examples yaml. #190
- (feat): add sequence packing for sft pipeline and distill pipeline, optimize memory usage during top-k logits computation.
- bug fix, refactor
- (fix): update math rule reward worker with thinking. #281
- (feat): set RAY_CGRAPH_get_timeout=600.
- (fix): fix train infer ratio/diff mean & add train infer ratio/diff token/seq mask & add rollout importance sampling. #242 #273
- (fix): ensure compatibility with transformers version check for causal mask update.
- (fix): fix vllm 0110 import for torch280.
- (fix): fix tokenizer mismatch between policy and reward model in llm judge reward worker. #91
- (fix): fix bugs in data fetching for face embeddings for wan_module.
- (fix): vllm _generate_standard missing prompt_token_ids input args in vllm >0.11.0. #189
- (fix): vllm add missing argument is_lora in function update_parameter. #233
- (fix): fix bugs with metrics recording in the DPO pipeline.
- (fix): update image loading logic for byte data in rlvr_vlm_pipeline.py
- (fix): add alive check. #253