-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Description
Description
I am trying to run Agent-R1 on HiPerGator using the B200 GPU partition (hpg-b200). The new B200 GPUs require PyTorch ≥2.7.0 with CUDA 12.8 for sm_100 architecture support. However, Verl and vLLM dependencies were originally pinned for Torch 2.3.0, which creates conflicts.
Steps to Reproduce
- Create a clean conda env with Python 3.10.
- Install Verl (0.2.0.dev0) as per project instructions.
- Install PyTorch stack for B200:
pip install torch==2.7.0+cu128 torchvision==0.22.0+cu128 torchaudio==2.7.0+cu128 --index-url https://download.pytorch.org/whl/cu128
- Try installing tensordict, xformers, and vllm.
Actual Results
- verl 0.2.0.dev0 requires tensordict<=0.6.2.
- torch==2.7.0 requires tensordict>=0.7.0.
- vllm 0.4.x (used in Agent-R1) only works with torch==2.3.0.
- vllm >=0.10.x works with Torch 2.7.0, but introduces breaking API changes (model_hf_config removed).
Example error
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
verl 0.2.0.dev0 requires tensordict<=0.6.2, which is not installed.
torchaudio 2.7.0+cu128 requires torch==2.7.0, but you have torch 2.7.1 which is incompatible.
torchvision 0.22.0+cu128 requires torch==2.7.0, but you have torch 2.7.1 which is incompatible.
Expected Results
Ability to install and run Agent-R1/Verl with PyTorch 2.7.0 + CUDA 12.8 (for B200 GPUs) without dependency conflicts.
CC
@lyumengxian
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels