- [Jan 15, 2026]: 📄 Our paper is now available on arXiv and Hugging Face Daily Paper.
- [Jan 14, 2026]: 🔥 We released all our MatchTIR model checkpoints and datasets. Checkout 🤗 MatchTIR here.
- [Jan 14, 2026]: 🚀 Full codebase of MatchTIR released.
| Dataset | Download |
|---|---|
| FTRL Training Data | 🤗 HuggingFace |
| Model | Download |
|---|---|
| Qwen3-8B-MatchTIR-KM | 🤗 HuggingFace |
| Qwen3-4B-MatchTIR-KM | 🤗 HuggingFace |
| Qwen3-8B-MatchTIR-OT | 🤗 HuggingFace |
| Qwen3-4B-MatchTIR-OT | 🤗 HuggingFace |
We propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation.
- Run the command to install the packages required.
# Create conda environment conda create -n MatchTIR python=3.10 conda activate MatchTIR # Install requirements cd MatchTIR-main pip install -r requirements.txt
- Download Benchmarks.
- FTRL: Designed for evaluating tool-integrated reasoning under automatically constructed local execution environments.
- BFCL: A comprehensive and rigorous benchmark designed to evaluate the function-calling capabilities of LLMs across a wide range of scenarios.
- ToolHop: Designed to evaluate LLMs in multi-hop tool-use scenarios.
FTRL is used for training and in-domain evaluation, while BFCL and ToolHop are adopted for out-of-domain evaluation to assess generalization.
You can adjust the hyperparameters in Scripts/run.sh:
--custom_reward_function.name: Choose betweencompute_process_KM(Hard) orcompute_process_ot(Soft).--actor_rollout_ref.model.path: Path to your local LLM.
- Run the shell script to perform policy optimization.
Bash Scripts/run.sh
- Run the command to merge the trained model into SATETENSORS format.
python3 Code/merge/merge_model.py merge --local_dir ${dir} --target_dir ${dir}
We evaluate MatchTIR on three benchmark datasets to assess its effectiveness and generalization ability across different tool-interaction scenarios.
- Run evaluation on FTRL with:
bash Scripts/eval_ftrl.sh- We use the official code provided by BFCL to perform evaluation. You can find the official code in the BFCL repo.
- Run evaluation on FTRL with:
bash Scripts/eval_toolhop.shIf you find our code or work useful for your research, please cite our work.
@article{qu2026matchtir,
title={MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching},
author={Qu, Changle and Dai, Sunhao and Cai, Hengyi and Xu, Jun and Wang, Shuaiqiang and Yin, Dawei},
journal={arXiv preprint arXiv:2601.10712},
year={2026}
}For any questions or feedback, please reach out to us at changlequ@ruc.edu.cn.
We employ the VeRL 0.3.1.dev framework for training.

