GitHub - Frontier-Lab-MBZUAI/GuessSketchFlow

Sketch: Verification Pipeline for Generated Assembly

The Sketch pipeline is designed to verify the correctness of LLM-generated assembly code by comparing it against source assembly at both semantic and symbolic levels.

Core Components

Custom Grammar (UniAssembly):
We use a hand-crafted ANTLR grammar (UniAssembly) to parse real-world ARM64 and RISC-V assembly into a structured abstract syntax tree (AST). This allows us to reason about instructions, labels, directives, and operands in a uniform way.

Processing Flow

Attention-Based Mapping:
We start with attention maps from a model that relate predicted tokens to source tokens. These are used to align each predicted line with its most likely source line.
Line Comparison:
- For each aligned pair, we parse both lines using our grammar.
- We compare the type (e.g., instruction, label, directive).
- If both are instructions, we check semantic equivalence using a normalized instruction class (e.g., add, move, jump).
- Operands are compared with:
  - Exact matching for immediates
  - Heuristic/distance-based comparison for registers
Line Scoring:
This gives us a confidence score for each predicted-to-source line match. Higher scores indicate stronger structural and semantic alignment.

Pure Block Extraction

Finding Pure Blocks:
- From each matched instruction line, we try to extract a pure arithmetic block.
- A line is considered pure if it contains only arithmetic instructions (no memory, branching, or directives).
- We grow the block upward and downward until we hit a non-pure instruction.
Block Verification (Z3):
- For each extracted block:
  - We identify input registers (used before defined)
  - The last defined register is treated as the block’s output
- Blocks are validated only if both predicted and source blocks have the same number of inputs.
Symbolic Execution in Z3:
- Each block is converted into a symbolic function using Z3Py.
- Registers are modeled as BitVec variables.
- Execution is simulated using semantic mappings (e.g., add, xor, sext.w) translated to Z3 expressions.
- We use z3.ForAll(...) to assert that for all possible inputs, both blocks produce identical outputs.

Result

This enables us to:

Automatically flag semantically incorrect or suspicious generated blocks
Prepare the groundwork for automated correction, guided by Z3 counterexamples and model attention scores

The end goal is a verification-aware generation loop, where symbolic mismatch acts as a correctness signal to further train, fine-tune, or patch the model’s outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.vscode		.vscode
configs		configs
docs		docs
eval		eval
src		src
.gitignore		.gitignore
README.md		README.md
convert_pkl.py		convert_pkl.py
debug.py		debug.py
formatdata.py		formatdata.py
parse.py		parse.py
requirements.txt		requirements.txt
run_guess.py		run_guess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sketch: Verification Pipeline for Generated Assembly

Core Components

Processing Flow

Pure Block Extraction

Result

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Frontier-Lab-MBZUAI/GuessSketchFlow

Folders and files

Latest commit

History

Repository files navigation

Sketch: Verification Pipeline for Generated Assembly

Core Components

Processing Flow

Pure Block Extraction

Result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages