heimdall-eval | Jonathan Becker

splash preview

Structured evaluation framework for heimdall-rs's decompilation and CFG generation using Claude as an LLM judge.

Overview

heimdall-eval provides a structured approach to evaluating and benchmarking Heimdall's decompilation accuracy and CFG generation quality. It uses Claude as an LLM judge to compare decompiled output against original Solidity source code, scoring based on logical preservation rather than syntactic similarity.

The evaluation framework assesses:

Decompilation accuracy (arithmetic, control flow, storage operations, external calls)
Control flow graph completeness and correctness

Project Structure

snippet.txt

1heimdall-eval/
2├── evals/           # Solidity test cases (Foundry projects)
3│   ├── loops/       # Can contain multiple contracts
4│   ├── nested-mappings/
5│   └── weth9/
6├── heimdall/        # Decompiled outputs and evaluation results
7│   ├── <Contract>/  # Output per contract
8│   └── evals.json   # Aggregated scores
9├── prompts/         # LLM evaluation prompts
10├── scripts/         # Build and evaluation scripts
11└── Makefile

Usage

Prerequisites

Heimdall installed and available in PATH
Foundry for compiling Solidity test cases
Claude Code CLI for running evaluations

Commands

Run decompilation on a specific target:

snippet.sh

1make run <target>

Run decompilation on all targets:

snippet.sh

1make run-all

Evaluate a specific target (runs decompilation + LLM evaluation):

snippet.sh

1make eval <target>

Evaluate all targets:

snippet.sh

1make eval-all

Use a development build of Heimdall:

snippet.sh

1make eval-all DEV=1

Results

Evaluation scores are written to heimdall/evals.json:

snippet.json

1{
2  "SimpleLoop": { "cfg": 100, "decompilation": 25 },
3  "NestedLoop": { "cfg": 100, "decompilation": 25 },
4  "WhileLoop": { "cfg": 100, "decompilation": 25 },
5  "WETH9": { "cfg": 100, "decompilation": 65 }
6}

Adding Test Cases

Create a new Foundry project in evals/<name>/
Add Solidity source files to evals/<name>/src/ (supports multiple contracts per eval)
Run make eval <name> to generate and evaluate decompiled output

Each eval can contain multiple contracts. For example, the loops eval contains SimpleLoop.sol, NestedLoop.sol, and WhileLoop.sol, which are all evaluated together.

Contributing

If you'd like to contribute test cases or improve the evaluation prompts, please open a pull-request with your changes.

Issues

If you've found an issue or have a question, please open an issue here.

Credits

heimdall-eval is maintained by Jonathan Becker as part of the Heimdall project.