
Structured evaluation framework for heimdall-rs's decompilation and CFG generation using Claude as an LLM judge.
Overview
heimdall-eval provides a structured approach to evaluating and benchmarking Heimdall's decompilation accuracy and CFG generation quality. It uses Claude as an LLM judge to compare decompiled output against original Solidity source code, scoring based on logical preservation rather than syntactic similarity.
The evaluation framework assesses:
- Decompilation accuracy (arithmetic, control flow, storage operations, external calls)
- Control flow graph completeness and correctness
Project Structure
1heimdall-eval/
2├── evals/ # Solidity test cases (Foundry projects)
3│ ├── loops/ # Can contain multiple contracts
4│ ├── nested-mappings/
5│ └── weth9/
6├── heimdall/ # Decompiled outputs and evaluation results
7│ ├── <Contract>/ # Output per contract
8│ └── evals.json # Aggregated scores
9├── prompts/ # LLM evaluation prompts
10├── scripts/ # Build and evaluation scripts
11└── MakefileUsage
Prerequisites
- Heimdall installed and available in PATH
- Foundry for compiling Solidity test cases
- Claude Code CLI for running evaluations
Commands
Run decompilation on a specific target:
1make run <target>Run decompilation on all targets:
1make run-allEvaluate a specific target (runs decompilation + LLM evaluation):
1make eval <target>Evaluate all targets:
1make eval-allUse a development build of Heimdall:
1make eval-all DEV=1Results
Evaluation scores are written to heimdall/evals.json:
1{
2 "SimpleLoop": { "cfg": 100, "decompilation": 25 },
3 "NestedLoop": { "cfg": 100, "decompilation": 25 },
4 "WhileLoop": { "cfg": 100, "decompilation": 25 },
5 "WETH9": { "cfg": 100, "decompilation": 65 }
6}Adding Test Cases
- Create a new Foundry project in
evals/<name>/ - Add Solidity source files to
evals/<name>/src/(supports multiple contracts per eval) - Run
make eval <name>to generate and evaluate decompiled output
Each eval can contain multiple contracts. For example, the loops eval contains SimpleLoop.sol, NestedLoop.sol, and WhileLoop.sol, which are all evaluated together.
Contributing
If you'd like to contribute test cases or improve the evaluation prompts, please open a pull-request with your changes.
Issues
If you've found an issue or have a question, please open an issue here.
Credits
heimdall-eval is maintained by Jonathan Becker as part of the Heimdall project.