Skip to content

UCSB-NLP-Chang/HarnessLLM

Repository files navigation

HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning

This is the implementation for the paper HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning. We propose a framework that trains LLMs to generate automatic test harnesses.

Quick Links

Installation

conda create -n harness python=3.10 -y
conda activate harness
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
pip install --no-deps -e .
conda install conda-forge::python-prctl

Evaluation

First, generate test cases with a model on a specific dataset:

bash bashes/generate.sh {model} {data} {prompt} {port}

where prompt can be harness or inputoutput. For example, to run our model on LCB Seen version, run:

bash bashes/generate.sh Shiyu-Lab/HarnessLLM_RL_Qwen3_4B Shiyu-Lab/Testcase_LCB_Seen harness 30000

Next, evaluate the generated test cases with:

python -m scripts.eval {model} {prompt} --data_path {data}

For example, to evaluate the above-generated test cases, run:

python -m scripts.eval Shiyu-Lab/HarnessLLM_RL_Qwen3_4B harness --data_path Shiyu-Lab/Testcase_LCB_Seen

You can also run bash bashes/generate_and_eval_testcase.sh to evaluate on all datasets.

Training

SFT Training

We use LLaMA-Factory for SFT training. You can download our training data from Huggingface. The config file we use for SFT training is scripts/qwen3_sft.yaml. You can follow the instructions on LLaMA-Factory to launch training.

RL Training

Use the script examples/grpo_trainer/run_harness.sh to train our model.

Citation

@misc{liu2025harnessllmautomatictestingharness,
      title={HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning}, 
      author={Yujian Liu and Jiabao Ji and Yang Zhang and Wenbo Guo and Tommi Jaakkola and Shiyu Chang},
      year={2025},
      eprint={2511.01104},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2511.01104}, 
}

Acknowledgement

Our implementation is based on Verl.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published