🛠️🤖 MatchTIR: Fine-Grained Supervision for
Tool-Integrated Reasoning via Bipartite Matching

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 Latest News

[Jan 15, 2026]: 📄 Our paper is now available on arXiv and Hugging Face Daily Paper.
[Jan 14, 2026]: 🔥 We released all our MatchTIR model checkpoints and datasets. Checkout 🤗 MatchTIR here.
[Jan 14, 2026]: 🚀 Full codebase of MatchTIR released.

📦 Dataset & Model Zoo

Dataset	Download
FTRL Training Data	🤗 HuggingFace

Model	Download
Qwen3-8B-MatchTIR-KM	🤗 HuggingFace
Qwen3-4B-MatchTIR-KM	🤗 HuggingFace
Qwen3-8B-MatchTIR-OT	🤗 HuggingFace
Qwen3-4B-MatchTIR-OT	🤗 HuggingFace

💡 Overview

We propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation.

📊 Overall Performance

🛠️ Setup

🛠️Environment

Run the command to install the packages required.

# Create conda environment
conda create -n MatchTIR python=3.10
conda activate MatchTIR

# Install requirements
cd MatchTIR-main
pip install -r requirements.txt

🤖Model

Download LLMs from Huggingface. Qwen3-4B and Qwen3-8B .

📊 Benchmarks

Download Benchmarks.
- FTRL: Designed for evaluating tool-integrated reasoning under automatically constructed local execution environments.
- BFCL: A comprehensive and rigorous benchmark designed to evaluate the function-calling capabilities of LLMs across a wide range of scenarios.
- ToolHop: Designed to evaluate LLMs in multi-hop tool-use scenarios.

FTRL is used for training and in-domain evaluation, while BFCL and ToolHop are adopted for out-of-domain evaluation to assess generalization.

⚙️ Training Configuration

You can adjust the hyperparameters in Scripts/run.sh:

--custom_reward_function.name: Choose between compute_process_KM (Hard) or compute_process_ot (Soft).
--actor_rollout_ref.model.path: Path to your local LLM.

🚀 Quick Start

🔥 Training

Run the shell script to perform policy optimization.
```
Bash Scripts/run.sh
```

🤖 Model Merge

Run the command to merge the trained model into SATETENSORS format.

python3 Code/merge/merge_model.py merge --local_dir ${dir} --target_dir ${dir}

🚀 Evaluation

We evaluate MatchTIR on three benchmark datasets to assess its effectiveness and generalization ability across different tool-interaction scenarios.

📊 Benchmark 1: FTRL

Run evaluation on FTRL with:

bash Scripts/eval_ftrl.sh

📊 Benchmark 2: BFCL

We use the official code provided by BFCL to perform evaluation. You can find the official code in the BFCL repo.

📊 Benchmark 3: ToolHop

Run evaluation on FTRL with:

bash Scripts/eval_toolhop.sh

📄 Citation

If you find our code or work useful for your research, please cite our work.

@article{qu2026matchtir,
  title={MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching},
  author={Qu, Changle and Dai, Sunhao and Cai, Hengyi and Xu, Jun and Wang, Shuaiqiang and Yin, Dawei},
  journal={arXiv preprint arXiv:2601.10712},
  year={2026}
}

📞 Contact

For any questions or feedback, please reach out to us at changlequ@ruc.edu.cn.

☕️ Acknowledgement

We employ the VeRL 0.3.1.dev framework for training.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Code		Code
Data		Data
Scripts		Scripts
images		images
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛠️🤖 MatchTIR: Fine-Grained Supervision for
Tool-Integrated Reasoning via Bipartite Matching

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 Latest News

📦 Dataset & Model Zoo

💡 Overview

📊 Overall Performance

🛠️ Setup

🛠️Environment

🤖Model

📊 Benchmarks

⚙️ Training Configuration

🚀 Quick Start

🔥 Training

🤖 Model Merge

🚀 Evaluation

📊 Benchmark 1: FTRL

📊 Benchmark 2: BFCL

📊 Benchmark 3: ToolHop

📄 Citation

📞 Contact

☕️ Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Languages

quchangle1/MatchTIR

Folders and files

Latest commit

History

Repository files navigation

🛠️🤖 MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 Latest News

📦 Dataset & Model Zoo

💡 Overview

📊 Overall Performance

🛠️ Setup

🛠️Environment

🤖Model

📊 Benchmarks

⚙️ Training Configuration

🚀 Quick Start

🔥 Training

🤖 Model Merge

🚀 Evaluation

📊 Benchmark 1: FTRL

📊 Benchmark 2: BFCL

📊 Benchmark 3: ToolHop

📄 Citation

📞 Contact

☕️ Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

🛠️🤖 MatchTIR: Fine-Grained Supervision for
Tool-Integrated Reasoning via Bipartite Matching

Packages