Skip to content

Enabling the easy vibe-coding of convertors from ??? -> Inspect .eval Format

License

Notifications You must be signed in to change notification settings

ApolloResearch/vibe-code-inspect-eval-convertor

Repository files navigation

Inspect Eval Convertor

Convert your custom LLM evaluation format into the canonical Inspect AI eval format.

Quick Start

  1. Create a new example directory:

    mkdir examples/my_format
  2. Add your input data to examples/my_format/input.* (e.g .json, .csv, .xml, .yaml)

  3. Use an AI coding assistant to create the converter. Here's a prompt you can use (replace with the actual path to your input file):

I need you to create a converter for my LLM evaluation data format to Inspect AI's eval format.

**Repository**: This is the inspect-eval-convertor repository

**Entry Point**: Read `docs/INDEX.md` first. It explains the task, repository structure, and points to all important documentation.

**Your Task**: Create a task script at `examples/my_format/task.py` that:
- Reads my input file at `examples/my_format/input.json`
- Creates an Inspect AI Task that recreates the eval log
- Uses the `task_main()` helper to run the task and write output to `examples/my_format/input.eval`

**Important Documentation**:
- `docs/INDEX.md` - Overview and entry point
- `docs/CONVERSION_GUIDE.md` - Step-by-step guide for creating converters
- `docs/INVESTIGATING_EVAL_FILES.md` - How to investigate and validate output
- `docs/TROUBLESHOOTING.md` - Common issues and solutions

**Reference Examples**: Study these existing converters:
- `examples/simple_chat/` - Basic prompt/response conversion
- `examples/multi_turn/` - Multi-turn conversations
- `examples/tool_calling/` - Tool interactions
- `examples/csv_format/` - CSV parsing example
- `examples/xml_format/` - XML parsing example
- `examples/yaml_format/` - YAML parsing example
- `examples/forking_trajectories/` - Branching conversation paths

**Key Points**:
- Use Inspect AI's `@task` decorator to create a Task function
- Use `task_main()` from `inspect_convertor.utils` to run the task
- Define a solver (often `replay_solve()` to replay pre-recorded messages)
- Define a scorer (e.g., `score_scorer()` to read scores from metadata)
- Every sample needs at least one message and one score
- Use `ModelEvent` objects in sample metadata to populate tools for branching
- Validate the output using `inspect-convert-validate examples/my_format/input.eval`
- Investigate the output using `inspect-convert-investigate examples/my_format/input.eval`

Start by reading `docs/INDEX.md` to understand the structure, then create the converter following the patterns in the examples.
  1. Run the task:

    inspect-convert examples/my_format/input.json

    Or manually:

    uv run python examples/my_format/task.py examples/my_format/input.json
    # Creates: examples/my_format/input.eval
  2. Validate the output:

    inspect-convert-validate examples/my_format/input.eval

What's Included

  • Utilities: Safe conversion helpers, validation tools, CLI commands
  • Examples: 8 working converters with different input formats
  • Documentation: Complete guides in docs/ directory
  • Validation: Automatic validation and investigation tools

Test Vibe-Coding a Convertor

examples/playground/input.txt is included with an output validation script to test convertor creation.

# 1. Create a converter for the playground example

# 2. Run the conversion
inspect-convert examples/playground/input.txt

# 3. Validate with specialized test
uv run python examples/playground/validate.py

# If successful, you'll see:
# ✓ Playground converter validation PASSED!

CLI Commands

# Convert input to eval format (finds task.py automatically)
inspect-convert examples/my_format/input.json
# Creates: examples/my_format/input.eval

# Validate an eval file
inspect-convert-validate path/to/file.eval

# Investigate an eval file structure
inspect-convert-investigate path/to/file.eval

# Test all examples
inspect-convert-test-all

Documentation

See docs/INDEX.md for the complete documentation index.

LLM Coding Assistant Support

This repository includes configuration files for various LLM coding assistants:

  • Cursor: .cursor/rules/*.mdc files with detailed patterns
  • Claude Code: CLAUDE.md file for Claude-specific instructions
  • GitHub Copilot: .github/copilot-instructions.md for repository-wide instructions
  • General: AGENTS.md for universal agent instructions

These files ensure consistent behavior across different AI coding assistants and help guide LLMs to follow the correct task.py approach instead of deprecated patterns.

Installation

This project uses uv for package management. Install it first: https://github.com/astral-sh/uv

# Install the package
uv pip install -e .

# For development with additional tooling
uv pip install -e ".[dev]"

License

MIT

About

Enabling the easy vibe-coding of convertors from ??? -> Inspect .eval Format

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published