Skip to content

quchangle1/MatchTIR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛠️🤖 MatchTIR: Fine-Grained Supervision for
Tool-Integrated Reasoning via Bipartite Matching

Paper Dataset Model Python 3.10+

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 Latest News

  • [Jan 15, 2026]: 📄 Our paper is now available on arXiv and Hugging Face Daily Paper.
  • [Jan 14, 2026]: 🔥 We released all our MatchTIR model checkpoints and datasets. Checkout 🤗 MatchTIR here.
  • [Jan 14, 2026]: 🚀 Full codebase of MatchTIR released.

📦 Dataset & Model Zoo

Dataset Download
FTRL Training Data 🤗 HuggingFace
Model Download
Qwen3-8B-MatchTIR-KM 🤗 HuggingFace
Qwen3-4B-MatchTIR-KM 🤗 HuggingFace
Qwen3-8B-MatchTIR-OT 🤗 HuggingFace
Qwen3-4B-MatchTIR-OT 🤗 HuggingFace

💡 Overview

We propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation.

📊 Overall Performance

🛠️ Setup

🛠️Environment

  • Run the command to install the packages required.
    # Create conda environment
    conda create -n MatchTIR python=3.10
    conda activate MatchTIR
    
    # Install requirements
    cd MatchTIR-main
    pip install -r requirements.txt

🤖Model

📊 Benchmarks

  • Download Benchmarks.
    • FTRL: Designed for evaluating tool-integrated reasoning under automatically constructed local execution environments.
    • BFCL: A comprehensive and rigorous benchmark designed to evaluate the function-calling capabilities of LLMs across a wide range of scenarios.
    • ToolHop: Designed to evaluate LLMs in multi-hop tool-use scenarios.

FTRL is used for training and in-domain evaluation, while BFCL and ToolHop are adopted for out-of-domain evaluation to assess generalization.

⚙️ Training Configuration

You can adjust the hyperparameters in Scripts/run.sh:

  • --custom_reward_function.name: Choose between compute_process_KM (Hard) or compute_process_ot (Soft).
  • --actor_rollout_ref.model.path: Path to your local LLM.

🚀 Quick Start

🔥 Training

  • Run the shell script to perform policy optimization.
    Bash Scripts/run.sh

🤖 Model Merge

  • Run the command to merge the trained model into SATETENSORS format.
    python3 Code/merge/merge_model.py merge --local_dir ${dir} --target_dir ${dir}

🚀 Evaluation

We evaluate MatchTIR on three benchmark datasets to assess its effectiveness and generalization ability across different tool-interaction scenarios.

📊 Benchmark 1: FTRL

  • Run evaluation on FTRL with:
bash Scripts/eval_ftrl.sh

📊 Benchmark 2: BFCL

  • We use the official code provided by BFCL to perform evaluation. You can find the official code in the BFCL repo.

📊 Benchmark 3: ToolHop

  • Run evaluation on FTRL with:
bash Scripts/eval_toolhop.sh

📄 Citation

If you find our code or work useful for your research, please cite our work.

@article{qu2026matchtir,
  title={MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching},
  author={Qu, Changle and Dai, Sunhao and Cai, Hengyi and Xu, Jun and Wang, Shuaiqiang and Yin, Dawei},
  journal={arXiv preprint arXiv:2601.10712},
  year={2026}
}

📞 Contact

For any questions or feedback, please reach out to us at changlequ@ruc.edu.cn.

☕️ Acknowledgement

We employ the VeRL 0.3.1.dev framework for training.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published