Convert your custom LLM evaluation format into the canonical Inspect AI eval format.
-
Create a new example directory:
mkdir examples/my_format
-
Add your input data to
examples/my_format/input.*(e.g.json,.csv,.xml,.yaml) -
Use an AI coding assistant to create the converter. Here's a prompt you can use (replace with the actual path to your input file):
I need you to create a converter for my LLM evaluation data format to Inspect AI's eval format.
**Repository**: This is the inspect-eval-convertor repository
**Entry Point**: Read `docs/INDEX.md` first. It explains the task, repository structure, and points to all important documentation.
**Your Task**: Create a task script at `examples/my_format/task.py` that:
- Reads my input file at `examples/my_format/input.json`
- Creates an Inspect AI Task that recreates the eval log
- Uses the `task_main()` helper to run the task and write output to `examples/my_format/input.eval`
**Important Documentation**:
- `docs/INDEX.md` - Overview and entry point
- `docs/CONVERSION_GUIDE.md` - Step-by-step guide for creating converters
- `docs/INVESTIGATING_EVAL_FILES.md` - How to investigate and validate output
- `docs/TROUBLESHOOTING.md` - Common issues and solutions
**Reference Examples**: Study these existing converters:
- `examples/simple_chat/` - Basic prompt/response conversion
- `examples/multi_turn/` - Multi-turn conversations
- `examples/tool_calling/` - Tool interactions
- `examples/csv_format/` - CSV parsing example
- `examples/xml_format/` - XML parsing example
- `examples/yaml_format/` - YAML parsing example
- `examples/forking_trajectories/` - Branching conversation paths
**Key Points**:
- Use Inspect AI's `@task` decorator to create a Task function
- Use `task_main()` from `inspect_convertor.utils` to run the task
- Define a solver (often `replay_solve()` to replay pre-recorded messages)
- Define a scorer (e.g., `score_scorer()` to read scores from metadata)
- Every sample needs at least one message and one score
- Use `ModelEvent` objects in sample metadata to populate tools for branching
- Validate the output using `inspect-convert-validate examples/my_format/input.eval`
- Investigate the output using `inspect-convert-investigate examples/my_format/input.eval`
Start by reading `docs/INDEX.md` to understand the structure, then create the converter following the patterns in the examples.
-
Run the task:
inspect-convert examples/my_format/input.json
Or manually:
uv run python examples/my_format/task.py examples/my_format/input.json # Creates: examples/my_format/input.eval -
Validate the output:
inspect-convert-validate examples/my_format/input.eval
- Utilities: Safe conversion helpers, validation tools, CLI commands
- Examples: 8 working converters with different input formats
- Documentation: Complete guides in
docs/directory - Validation: Automatic validation and investigation tools
examples/playground/input.txt is included with an output validation
script to test convertor creation.
# 1. Create a converter for the playground example
# 2. Run the conversion
inspect-convert examples/playground/input.txt
# 3. Validate with specialized test
uv run python examples/playground/validate.py
# If successful, you'll see:
# ✓ Playground converter validation PASSED!# Convert input to eval format (finds task.py automatically)
inspect-convert examples/my_format/input.json
# Creates: examples/my_format/input.eval
# Validate an eval file
inspect-convert-validate path/to/file.eval
# Investigate an eval file structure
inspect-convert-investigate path/to/file.eval
# Test all examples
inspect-convert-test-allSee docs/INDEX.md for the complete documentation index.
This repository includes configuration files for various LLM coding assistants:
- Cursor:
.cursor/rules/*.mdcfiles with detailed patterns - Claude Code:
CLAUDE.mdfile for Claude-specific instructions - GitHub Copilot:
.github/copilot-instructions.mdfor repository-wide instructions - General:
AGENTS.mdfor universal agent instructions
These files ensure consistent behavior across different AI coding assistants and help guide LLMs to follow the correct task.py approach instead of deprecated patterns.
This project uses uv for package management. Install it first: https://github.com/astral-sh/uv
# Install the package
uv pip install -e .
# For development with additional tooling
uv pip install -e ".[dev]"MIT