feat(verl): add unexpected tool call filtering#467
feat(verl): add unexpected tool call filtering#467iamseungpil wants to merge 1 commit intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree company="Gwangju Institute of Science and Technology" |
Add filtering for "unexpected tool call" turns where the model continues generating after a tool call instead of stopping at </tool_call><|im_end|>. This helps prevent entropy explosion during GRPO training. Changes: - daemon.py: Add _setup_tool_call_filter(), _count_invalid_turns(), _filter_invalid_turns(), and void turn filtering - config.yaml: Add filter_unexpected_tool_calls option (default: False) - trainer.py: Fix missing gts parameter in _dump_generations() - examples/calc_x/train_calc_agent.py: Add --filter-unexpected-tool-calls CLI flag Key improvements over Youtu branch: - Uses apply_chat_template() for model-agnostic token detection - Supports multiple valid endings (eos_token, pad_token variants) - Uses calculator tool example for calc-x consistency Reference: contrib/youtu-agent-lightning branch
7a5ee47 to
555d1fc
Compare
| and self.trace_aggregator.get("debug", False) | ||
| else {} | ||
| ), | ||
| "training/n_unexpected_tool_calls": n_unexpected_tool_calls, |
There was a problem hiding this comment.
Small comment: only set the logging metrics visible when self.tool_parser is not None.
| import agentlightning as agl | ||
| from agentlightning.env_var import LightningEnvVar, resolve_bool_env_var, resolve_str_env_var | ||
|
|
||
| # Ensure venv bin is in PATH (needed for uvx/mcp-server-calculator in Ray workers) |
There was a problem hiding this comment.
Some unnecessary changes to this file. Only related config should be included here I think.
| filter_unexpected_tool_calls: bool = False, | ||
| experiment_name: Optional[str] = None, | ||
| n_gpus: int = 1, | ||
| checkpoint_dir: str = "/home/jovyan/msra/experiments/checkpoints", |
There was a problem hiding this comment.
Could you please explain about this line? It seems that this path belongs to someone else?
| "--checkpoint-dir", | ||
| type=str, | ||
| default="/home/jovyan/msra/experiments/checkpoints", | ||
| help="Directory to save checkpoints (default: /home/jovyan/msra/experiments/checkpoints)", |
There was a problem hiding this comment.
Thank you for your careful review and for raising this question.
To clarify, /home/jovyan is not a specific person's directory—it is the default home directory name on the OpenHPC server provided by my university (GIST). The msra folder is my personal working directory that I created specifically for this project, which is also linked to my GitHub repository.
I have attached screenshots of my university's HPC-AI Service Portal as evidence. As you can see, /home/jovyan is the default home directory automatically assigned when a workspace is created on this server.
I attached the training code without modification because I wanted to transparently show exactly how the experiments were conducted. However, I realize now that I should have cleaned up these internal file paths before submission. I apologize for any confusion this may have caused—this is my first time collaborating with an industry partner, and I was not aware this could raise concerns.
Summary
</tool_call><|im_end|>)training/unexpected_tool_call_ratiometric for monitoringgtsparameter in validation data dumpConfiguration
YAML (
agentlightning/verl/config.yaml):CLI:
python examples/calc_x/train_calc_agent.py --filter-unexpected-tool-callsVerification
cd examples/calc_xFilter OFF (baseline)
Filter ON