Skip to content
/ holey Public
forked from namin/holey

Python library for program synthesis and symbolic execution combining constraint solving and LLMs

License

Notifications You must be signed in to change notification settings

aygp-dr/holey

 
 

Repository files navigation

Holey

A Python library for program synthesis and symbolic execution that combines Z3's constraint solving with LLM-guided synthesis. Put holes in your Python code and let holey fill them using formal constraints, natural language specifications, or both.

The symbolic execution is inspired by Philip Zucker's blog post "Symbolic Execution by Overloading __bool__", but explores all branches exhaustively instead of randomly and fleshes out the concepts towards solving Python Programming Puzzles.

The solver incorporates heuristics from LLMs in addition to symbolic execution.

Setup

Install dependencies

  • python with support for pip (e.g. conda), tested with Python 3.12
  • z3 or cvc5 or both -- on mac with Homebrew, can install with brew install z3 cvc5

Clone recursive

git clone --recursive https://github.com/namin/holey.git

Setup Python environment

make setup
conda activate holey

Or manually:

conda create -n holey python=3.12
conda activate holey
pip install -e ".[test,ollama,anthropic,google-genai,openai]"

Setup LLM environments

For each LLM you want to use, provide an LLM API key, even if only a dummy one is needed. Only provide a key if you want to use that particular LLM provider. All provided keys will be used in parallel to generate a matrix of successes per LLM provider.

export OLLAMA_API_KEY=ollama
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...
export OPENAI_API_KEY=...

Run

Help reference

python puzzle_solver.py --help

Sanity check

make sanity-check

Or manually:

python puzzle_solver.py --name-prefix HelloWorld:0

Run all puzzles, saving stdout/stderr to file results.txt

make run-all

Or manually:

python puzzle_solver.py >results.txt 2>&1

Fallback to LLMs

Set ANTHROPIC_API_KEY for Claude or default to local Ollama.

make run-llm

Or manually:

python puzzle_solver.py --name-prefix ListIn:1 --llm

Run tests

make test

Or manually:

python -m pytest

Current status

The symbolic execution currently solves:

  • 61% (220 out of 360) of int puzzles,
  • 33% (119 out of 363) of str puzzles,
  • 18% (9 out of 51) of float puzzles,
  • 45% (348 out of 774) overall.

with the following errors:

  • 10 timeouts after 3 seconds at staging time (while generating the SMTLIB program)
  • 158 errors at at staging time
  • 161 SMTLIB programs returning sat but the original sat function failing on synthesized model input,
  • 151 SMTLIB programs returning non-sat (e.g. unsat, unknown or timing out after 2 seconds timeouts after staging (while building the SMTLIB program), errors during staging time, the SMTLIB
  • 941 (out of 1715) puzzles not yet even attempted because their type is not int or str, such as float, list (of various specialization), etc.

Extrapolation

  • 117 smaller problems tried
  • 14 successes on smaller problem
  • 9 successful extrapolations

Matrix

  • claude (extrapolate) 6 1 0 0 1 1 0 0 0 1 0 0 1 0 1
  • claude (end-to-end) 3 0 0 0 0 0 1 0 0 0 0 0 1 0 1
  • claude (SMTLIB) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • gemini (extrapolate) 7 0 1 1 1 1 0 0 0 1 0 1 0 0 1
  • gemini (end-to-end) 3 0 0 0 0 0 0 0 1 0 0 0 1 0 1
  • gemini (SMTLIB) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • ollama (extrapolate) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • ollama (end-to-end) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • ollama (SMTLIB) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Source map

.
├── README.md
├── benchmarks
│   └── PythonProgrammingPuzzles benchmark added as git submodule
├── holey
│   ├── __init__.py
│   ├── backend.py backend to SMTLIB batch processes
│   ├── core.py includes tracer, symbolic classes, ...
│   ├── llm.py support for LLM generation and code extraction
│   └── preprocessor.py includes node transformer and sat driver
├── log
│   └── results.txt example run
├── puzzle_solver.py main routine for benchmark solver
├── pyproject.toml
└── tests
└── test_core.py ran with python -m pytest, basic and LLM-generated

Contribute!

I need help in completely fleshing out the symbolic executor as well as designing and implementing LLM-based heuristics to complement it. See the contributing guidelines, in particular discussing a workflow to find and fix issues driven by the benchmarks.

About

Python library for program synthesis and symbolic execution combining constraint solving and LLMs

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Makefile 0.8%