Context-Guided Decompilation: A Step Towards Re-executability
- URL: http://arxiv.org/abs/2511.01763v1
- Date: Mon, 03 Nov 2025 17:21:39 GMT
- Title: Context-Guided Decompilation: A Step Towards Re-executability
- Authors: Xiaohan Wang, Yuxin Hu, Kevin Leach,
- Abstract summary: Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
- Score: 50.71992919223209
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Binary decompilation plays an important role in software security analysis, reverse engineering, and malware understanding when source code is unavailable. However, existing decompilation techniques often fail to produce source code that can be successfully recompiled and re-executed, particularly for optimized binaries. Recent advances in large language models (LLMs) have enabled neural approaches to decompilation, but the generated code is typically only semantically plausible rather than truly executable, limiting their practical reliability. These shortcomings arise from compiler optimizations and the loss of semantic cues in compiled code, which LLMs struggle to recover without contextual guidance. To address this challenge, we propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code. We evaluate our method across multiple datasets, optimization levels, and compilers, demonstrating around 40\% improvement in re-executability over state-of-the-art decompilation methods while maintaining robustness.
Related papers
- Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation [5.939983212292006]
We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs)<n>By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler.<n>Our results show that zk-compilation is applicable to real-world software and provides strong security guarantees.
arXiv Detail & Related papers (2026-02-12T12:36:36Z) - QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code [52.66657751895655]
Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation.<n>This paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation.<n>We propose a self-evolving prompt optimization method that enables LLMs to evolve their internal prompt strategies.
arXiv Detail & Related papers (2025-11-03T03:20:26Z) - SALT4Decompile: Inferring Source-level Abstract Logic Tree for LLM-Based Binary Decompilation [17.58664677898224]
saltm is a novel binary decompilation method that abstracts stable logical features between binary and source code.<n>saltm is highly effective in recovering the logic of the source code, significantly outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-09-18T05:57:15Z) - D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning [49.16469288280772]
Decompilers reconstruct human-readable source code from binaries.<n>Despite recent advances, their outputs often suffer from syntactic and semantic errors and remain difficult to read.<n>With the advent of large language models (LLMs), researchers began to explore the potential of LLMs to refine decompiler output.<n>We present D-LIFT, an enhanced decompiler-LLM pipeline with fine-tuned reinforcement learning.
arXiv Detail & Related papers (2025-06-11T19:09:08Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation [57.604506522287814]
Existing large language models (LLMs) only learn the contextual semantics of code during pre-training.<n>We propose ExeCoder to utilize executability representations such as functional semantics, syntax structures, and variable dependencies.<n>We show that ExeCoder achieves state-of-the-art performance in code translation, surpassing existing open-source code LLMs by over 10.88% to 38.78% and over 27.44% to 42.97% on two metrics.
arXiv Detail & Related papers (2025-01-30T16:18:52Z) - Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement [43.2637367483626]
Decompilation transforms compiled code back into a high-level programming language when source code is unavailable.
Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training.
By integrating these two methods, we achieved a Re-Executability performance improvement of approximately 3.90% on the Decompile-Eval benchmark, establishing a new state-of-the-art performance of 52.41%.
arXiv Detail & Related papers (2024-06-25T02:37:53Z) - LLM4Decompile: Decompiling Binary Code with Large Language Models [10.346311290153398]
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results difficult to read and execute.
We propose LLM4Decompile, the first and largest open-source LLM series (1.3B to 33B) trained to decompile binary code.
The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate.
arXiv Detail & Related papers (2024-03-08T13:10:59Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Refining Decompiled C Code with Large Language Models [15.76430362775126]
A C decompiler converts an executable into source code.
The recovered C source code, once re-compiled, is expected to produce an executable with the same functionality as the original executable.
arXiv Detail & Related papers (2023-10-10T11:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.