ValueError: operands could not be broadcast together with shapes (243,) (158,)

When the training process comes to step 78, there is a bug as shown below. This is most likely caused by the problem of mask. I only adapted this framework to my task, wondering if this is a common bug or other problem?

```bash
Traceback (most recent call last):
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 67, in main
    run_agent(config)
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 79, in run_agent
    ray.get(runner.run.remote(config))
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2858, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/worker.py", line 958, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=434664, ip=10.0.0.6, actor_id=73a93c038ea79f93bdc1c7a801000000, repr=<main_agent.TaskRunner object at 0x792c96999960>)
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 202, in run
    trainer.fit()
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/agent_ray_trainer.py", line 1033, in fit
    gen_batch_output = generation_manager.run_llm_loop(
  File "/storage/v-wenzeng/Agent-R1/agent_r1/llm_agent/generation.py", line 367, in run_llm_loop
    rollings = self._update_rolling_state(
  File "/storage/v-wenzeng/Agent-R1/agent_r1/llm_agent/generation.py", line 197, in _update_rolling_state
    new_action_masks.append(action_mask + action_masks[i])
ValueError: operands could not be broadcast together with shapes (243,) (158,)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
```

The training script used here:
```bash
export BASE_MODEL='Qwen/Qwen2.5-3B-Instruct'
export PROJECT_NAME='multiturn'
export EXPERIMENT_NAME=ppo-qwen2.5-3b-instruct

python3 -m agent_r1.src.main_agent \
    data.train_files=['data/python_multiturn_new/qwen/finqa/train.parquet'] \
    data.val_files=['data/python_multiturn_new/qwen/finqa/test.parquet'] \
    data.train_batch_size=2 \
    data.max_prompt_length=8192 \
    data.max_response_length=8192 \
    data.max_response_length_single_turn=1024 \
    actor_rollout_ref.model.path=$BASE_MODEL \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=2 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.stop_token_ids=[151658] \
    actor_rollout_ref.rollout.stop=[] \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=$BASE_MODEL \
    critic.model.enable_gradient_checkpointing=True \
    critic.ppo_micro_batch_size_per_gpu=1 \
    critic.model.fsdp_config.param_offload=True \
    critic.model.fsdp_config.optimizer_offload=True \
    algorithm.adv_estimator=gae \
    algorithm.kl_ctrl.kl_coef=0.001 \
    algorithm.use_kl_in_reward=True \
    trainer.critic_warmup=3 \
    trainer.logger=['console','wandb'] \
    trainer.project_name=$PROJECT_NAME \
    trainer.experiment_name=$EXPERIMENT_NAME \
    trainer.n_gpus_per_node=1 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=10 \
    trainer.val_before_train=True \
    trainer.log_val_generations=0 \
    tool.max_turns=3 \
    tool.tools=['table_keyword_search'] \
    tool.use_batch_tool_calls=False \
    tool.val_kwargs.use_batch_tool_calls=False \
    tool.max_tool_response_length=512 $@
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: operands could not be broadcast together with shapes (243,) (158,) #63

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ValueError: operands could not be broadcast together with shapes (243,) (158,) #63

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions