Combinatorial Benchmark HumanEval_T

This repository contains the implementation and evaluation framework for the paper titled "Addressing Data Leakage in HumanEval Using Combinatorial Test Design"

Repository Structure

combinatorial-benchmark/
├── humanEval benchmark/           # HumanEval-style programming challenges
│   ├── prompts/                  # Individual problem definitions
│   │   └── problem[1-10].json    # Problem specifications and test cases
│   ├── generated_outputs_problem[1-10].txt  # Model outputs for each problem
│   ├── huggingfaceBenchmark.ipynb  # Jupyter notebook for HuggingFace model evaluation
│   └── human.py                  # Human baseline implementation
├── meta benchmark/               # Meta-learning benchmark tasks
│   ├── meta_prompts/            # Meta-learning problem definitions
│   │   └── problem_[1-10].json  # Meta-problem specifications
│   └── generated_outputs_problem[1-10].txt  # Model outputs for meta-problems
├── main.py                      # Main evaluation script
└── README.md                    # This file

Components

HumanEval Benchmark

Contains 10 programming problems from the HumanEval benchmark
Each problem in prompts/ includes:
- Problem description
- Input/output specifications
- Test cases
- Evaluation criteria
Generated outputs are stored in separate files for analysis
Includes a Jupyter notebook for evaluating HuggingFace models
human.py provides code to evaluate the humanEval dataset samples

Meta Benchmark

Features 10 template problems to assess LLMs' ability to generalize across problem patterns
Problems in meta_prompts/ include:
- template descriptions
- Instance generation rules
- Evaluation metrics

Main Script

The main.py script provides functionality for:

Loading and processing problem definitions
Interfacing with different LLM providers (OpenAI, Anthropic, Ollama)
Running evaluations and collecting results
Testing generated solutions against provided test cases

Usage

Install dependencies:

pip install openai anthropic requests

Set up API keys in environment variables:

export OPENAI_API_KEY="your_key_here"
export ANTHROPIC_API_KEY="your_key_here"

Run evaluations:

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
combinatorial Benchmark		combinatorial Benchmark
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Combinatorial Benchmark HumanEval_T

Repository Structure

Components

HumanEval Benchmark

Meta Benchmark

Main Script

Usage

About

Uh oh!

Releases

Packages

Languages

MsMore/HumanEval_T

Folders and files

Latest commit

History

Repository files navigation

Combinatorial Benchmark HumanEval_T

Repository Structure

Components

HumanEval Benchmark

Meta Benchmark

Main Script

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages