Skip to content

任务适配输出异常 #61

@xjtupy

Description

@xjtupy

想用作者的代码训练我自己的任务,任务大致如下:

### 任务目标
你是一个智能决策模型负责根据用户搜索的query列表与点击的poi列表找出数据库中缺失的query集合### 任务执行方式
1找出不相关query集合分析每一个搜索query与点击poi列表中每个poi的相关性如果搜索query与点击poi列表中每个poi都不相关那么这个query就是不相关query你的分析过程需要用<no_relate_think></no_relate_think>包裹起来最终找出的不相关的query集合以json格式保存并用<no_relate_query></no_relate_query>包裹起来例如<no_relate_query>[{{"query":"xxx", "lng":"xxx", "lat":"xxx", "country_code":"xxx"}}, ...]</no_relate_query>
2生成调用接口请求如果存在不相关的query集合你可以通过<search>[{{"query":"xxx", "lng":"xxx", "lat":"xxx", "country_code":"xxx"}}, ...]</search>调用检索接口它会返回每个不相关query检索到的poi数据列表并用<poi_recall_informations></poi_recall_informations>包裹起来3找出数据库中缺失的query集合分析每个不相关query和它检索到的poi数据列表中每个poi数据的相关性如果不相关query与它检索到的poi数据列表中每个poi都不相关那么这个query就是数据库中缺失的query你的分析过程用<miss_query_think></miss_query_think>包裹起来最终找出的数据库中缺失的query集合以json格式保存并用<answer></answer>包裹起来例如<answer>[{{"query":"xxx", "lng":"xxx", "lat":"xxx", "country_code":"xxx"}}, ...]</answer>

### 输出格式
你的输出格式必须是以下两种的一种1存在不相关的query集合的输出格式<no_relate_think>你分析不相关query的思考过程</no_relate_think>
<no_relate_query>不相关的query集合</no_relate_query>
<search>调用搜索工具的query集合</search>
<poi_recall_informations>召回的poi列表</poi_recall_informations>
<miss_query_think>你分析缺失query的思考过程</miss_query_think>
<answer>最终答案</answer>
2不存在不相关的query集合的输出格式<no_relate_think>你分析不相关query的思考过程</no_relate_think>
<no_relate_query>[]</no_relate_query>
<miss_query_think>你分析缺失query的思考过程</miss_query_think>
<answer>最终答案</answer>

用户搜索的query列表:{search_querys}
用户点击的poi列表:{click_pois}\n

出现一下问题:
1、训练之前的校验阶段,输出内容和格式正常,但调用完工具后不在继续执行

Image 2、训练阶段,除了上述错误外,还会输出异常字符,格式也是混乱的 Image Image

我的训练脚本:

#model_type=$1

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export GPU_NUMS=8
export WANDB_API_KEY=''
export RAY_TMPDIR='/nfs/dataset-ofs-search-v1/ray_tmpdir'
export DATA_DIR='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v1/miss_mining_agent/data'

WAND_PROJECT='miss_mining_agent_v2'

BASE_MODEL=""
EXPERIMENT_NAME=""

#if [ "$model_type" = "llama3.1_8b_instruct" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v2/miss_mining_agent/models/Meta-Llama-3.1-8B-Instruct'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.1-8b'
#elif [ "$model_type" = "llama3.1_8b_instruct_sft" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/product_auto/model/merge/merge_miss_query_agent_lora_sft_ds3_llama3.1_8b'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.1-8b_sft'
#elif [ "$model_type" = "llama3.2_1b_instruct" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v2/miss_mining_agent/models/Llama-3.2-1B-Instruct'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.2-1b'
#elif [ "$model_type" = "llama3.2_3b_instruct" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v2/miss_mining_agent/models/Llama-3.2-3B-Instruct'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.2-3b'
#elif [ "$model_type" = "qwen2.5_7b_instruct_sft" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/product_auto/model/merge/merge_miss_query_agent_lora_sft_ds3_qwen2.5_7b_2'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-qwen2.5_7b'
#else
#  BASE_MODEL=""
#  EXPERIMENT_NAME=""
#fi

#export BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/LLaMA-Factory/pretrain_model/Qwen2.5-7B-Instruct'
export BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/product_auto/model/merge/merge_miss_query_agent_lora_sft_ds3_qwen2.5_7b'
export EXPERIMENT_NAME='miss_mining_agent-grpo-qwen2.5_7b_lora'

HYDRA_FULL_ERROR=1 python3 -m agent_r1.src.main_agent \
    algorithm.adv_estimator=grpo \
    data.train_files=["$DATA_DIR/train.parquet"] \
    data.val_files=["$DATA_DIR/test.parquet"] \
    data.train_batch_size=64 \
    data.val_batch_size=64 \
    data.max_prompt_length=4096 \
    data.max_response_length=4096 \
    data.max_response_length_single_turn=1024 \
    data.use_default_tool_template=False \
    actor_rollout_ref.model.path=$BASE_MODEL \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.285 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n_repeat=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.logger=['console','wandb'] \
    trainer.project_name=$PROJECT_NAME \
    trainer.experiment_name=$EXPERIMENT_NAME \
    trainer.n_gpus_per_node=$GPU_NUMS \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=10 \
    trainer.total_epochs=10 \
    trainer.val_before_train=True \
    trainer.log_val_generations=0 \
    tool.max_turns=2 \
    tool.tools=['dd_search'] \
    tool.env=dd_search \
    tool.max_tool_response_length=2048 \
    2>&1 | tee $EXPERIMENT_NAME.log

看了源代码,感觉没啥问题,希望作者帮忙解答下

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions