-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Description
When the training process comes to step 78, there is a bug as shown below. This is most likely caused by the problem of mask. I only adapted this framework to my task, wondering if this is a common bug or other problem?
Traceback (most recent call last):
File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 67, in main
run_agent(config)
File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 79, in run_agent
ray.get(runner.run.remote(config))
File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2858, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/worker.py", line 958, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=434664, ip=10.0.0.6, actor_id=73a93c038ea79f93bdc1c7a801000000, repr=<main_agent.TaskRunner object at 0x792c96999960>)
File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 202, in run
trainer.fit()
File "/storage/v-wenzeng/Agent-R1/agent_r1/src/agent_ray_trainer.py", line 1033, in fit
gen_batch_output = generation_manager.run_llm_loop(
File "/storage/v-wenzeng/Agent-R1/agent_r1/llm_agent/generation.py", line 367, in run_llm_loop
rollings = self._update_rolling_state(
File "/storage/v-wenzeng/Agent-R1/agent_r1/llm_agent/generation.py", line 197, in _update_rolling_state
new_action_masks.append(action_mask + action_masks[i])
ValueError: operands could not be broadcast together with shapes (243,) (158,)
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.The training script used here:
export BASE_MODEL='Qwen/Qwen2.5-3B-Instruct'
export PROJECT_NAME='multiturn'
export EXPERIMENT_NAME=ppo-qwen2.5-3b-instruct
python3 -m agent_r1.src.main_agent \
data.train_files=['data/python_multiturn_new/qwen/finqa/train.parquet'] \
data.val_files=['data/python_multiturn_new/qwen/finqa/test.parquet'] \
data.train_batch_size=2 \
data.max_prompt_length=8192 \
data.max_response_length=8192 \
data.max_response_length_single_turn=1024 \
actor_rollout_ref.model.path=$BASE_MODEL \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=2 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=True \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.stop_token_ids=[151658] \
actor_rollout_ref.rollout.stop=[] \
actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
critic.optim.lr=1e-5 \
critic.model.use_remove_padding=True \
critic.model.path=$BASE_MODEL \
critic.model.enable_gradient_checkpointing=True \
critic.ppo_micro_batch_size_per_gpu=1 \
critic.model.fsdp_config.param_offload=True \
critic.model.fsdp_config.optimizer_offload=True \
algorithm.adv_estimator=gae \
algorithm.kl_ctrl.kl_coef=0.001 \
algorithm.use_kl_in_reward=True \
trainer.critic_warmup=3 \
trainer.logger=['console','wandb'] \
trainer.project_name=$PROJECT_NAME \
trainer.experiment_name=$EXPERIMENT_NAME \
trainer.n_gpus_per_node=1 \
trainer.nnodes=1 \
trainer.save_freq=-1 \
trainer.test_freq=5 \
trainer.total_epochs=10 \
trainer.val_before_train=True \
trainer.log_val_generations=0 \
tool.max_turns=3 \
tool.tools=['table_keyword_search'] \
tool.use_batch_tool_calls=False \
tool.val_kwargs.use_batch_tool_calls=False \
tool.max_tool_response_length=512 $@Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels