Related papers: HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation

HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation

URL: http://arxiv.org/abs/2601.14598v1
Date: Wed, 21 Jan 2026 02:37:33 GMT
Title: HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation
Authors: Yonatan Gizachew Achamyeleh, Harsh Thomare, Mohammad Abdullah Al Faruque,
Abstract summary: textscHELIOS is a framework that reframes binary decompilation as a structured reasoning task.<n>textscHELIOS is a practical building block for reverse engineering in security settings.
Score: 11.110675371854988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have recently been applied to binary decompilation, yet they still treat code as plain text and ignore the graphs that govern program control flow. This limitation often yields syntactically fragile and logically inconsistent output, especially for optimized binaries. This paper presents \textsc{HELIOS}, a framework that reframes LLM-based decompilation as a structured reasoning task. \textsc{HELIOS} summarizes a binary's control flow and function calls into a hierarchical text representation that spells out basic blocks, their successors, and high-level patterns such as loops and conditionals. This representation is supplied to a general-purpose LLM, along with raw decompiler output, optionally combined with a compiler-in-the-loop that returns error messages when the generated code fails to build. On HumanEval-Decompile for \texttt{x86\_64}, \textsc{HELIOS} raises average object file compilability from 45.0\% to 85.2\% for Gemini~2.0 and from 71.4\% to 89.6\% for GPT-4.1~Mini. With compiler feedback, compilability exceeds 94\% and functional correctness improves by up to 5.6 percentage points over text-only prompting. Across six architectures drawn from x86, ARM, and MIPS, \textsc{HELIOS} reduces the spread in functional correctness while keeping syntactic correctness consistently high, all without fine-tuning. These properties make \textsc{HELIOS} a practical building block for reverse engineering workflows in security settings where analysts need recompilable, semantically faithful code across diverse hardware targets.

Related papers

Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z)
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code [52.66657751895655]
Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation.<n>This paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation.<n>We propose a self-evolving prompt optimization method that enables LLMs to evolve their internal prompt strategies.
arXiv Detail & Related papers (2025-11-03T03:20:26Z)
SALT4Decompile: Inferring Source-level Abstract Logic Tree for LLM-Based Binary Decompilation [17.58664677898224]
saltm is a novel binary decompilation method that abstracts stable logical features between binary and source code.<n>saltm is highly effective in recovering the logic of the source code, significantly outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-09-18T05:57:15Z)
WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding [58.1177179119881]
We introduce wgrammar, a lightweight decoding engine that integrates domain-aware simplification, constraint decomposition, and mask caching.<n> wgrammar achieves up to 250x speedup over existing systems.
arXiv Detail & Related papers (2025-07-22T17:13:47Z)
D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning [49.16469288280772]
Decompilers reconstruct human-readable source code from binaries.<n>Despite recent advances, their outputs often suffer from syntactic and semantic errors and remain difficult to read.<n>With the advent of large language models (LLMs), researchers began to explore the potential of LLMs to refine decompiler output.<n>We present D-LIFT, an enhanced decompiler-LLM pipeline with fine-tuned reinforcement learning.
arXiv Detail & Related papers (2025-06-11T19:09:08Z)
LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators [21.114491141763647]
Rewriting C code in Rust provides stronger memory safety, yet migrating larges such as the 32-million-line Linux kernel remains challenging.<n>Recent Large Language Model (LLM) approaches produce more idiomatic, safe Rust programs but frequently exhibit "laziness"<n>LLM-based C-to-Rust translation tool splits modules into discrete functions, translating them individually, and then reintegrating them.
arXiv Detail & Related papers (2025-03-31T07:09:07Z)
ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning [33.53059396922164]
Assembly code analysis and comprehension play critical roles in applications like reverse engineering.<n>Traditional masked language modeling approaches do not explicitly focus on natural language interaction.<n>We present Assembly Augmented Tuning, an end-to-end structural-semantic instruction tuning framework.
arXiv Detail & Related papers (2025-03-14T17:36:08Z)
EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking [58.15568681219339]
We introduce EquiBench, a new benchmark for evaluating large language models (LLMs)<n>This task directly tests a model's ability to reason about program semantics.<n>We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
LLM4Decompile: Decompiling Binary Code with Large Language Models [10.346311290153398]
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results difficult to read and execute. We propose LLM4Decompile, the first and largest open-source LLM series (1.3B to 33B) trained to decompile binary code. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate.
arXiv Detail & Related papers (2024-03-08T13:10:59Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.