Related papers: Evaluating Disassembly Errors With Only Binaries

Evaluating Disassembly Errors With Only Binaries

URL: http://arxiv.org/abs/2506.20109v2
Date: Fri, 04 Jul 2025 06:52:35 GMT
Title: Evaluating Disassembly Errors With Only Binaries
Authors: Lambang Akbar Wijayadi, Yuancheng Jiang, Roland H. C. Yap, Zhenkai Liang, Zhuohao Liu,
Abstract summary: This work is the first to evaluate disassembly errors using only the binary.<n>TraceBin targets the use case where the disassembly is used in an automated fashion for security tasks on a target binary.<n>It is also helpful in automated security tasks on (closed source) binaries relying on disassemblers.
Score: 8.416922409145759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Disassemblers are crucial in the analysis and modification of binaries. Existing works showing disassembler errors largely rely on practical implementation without specific guarantees and assume source code and compiler toolchains to evaluate ground truth. However, the assumption of source code is contrary to typical binary scenarios where only the binary is available. In this work, we investigate an approach with minimal assumptions and a sound approach to disassembly error evaluation that does not require source code. Any source code does not address the fundamental problem of binary disassembly and fails when only the binary exists. As far as we know, this is the first work to evaluate disassembly errors using only the binary. We propose TraceBin, which uses dynamic execution to find disassembly errors. TraceBin targets the use case where the disassembly is used in an automated fashion for security tasks on a target binary, such as static binary instrumentation, binary hardening, automated code repair, and so on, which may be affected by disassembly errors. Discovering disassembly errors in the target binary aids in reducing problems caused by such errors. Furthermore, we are not aware of existing approaches that can evaluate errors given only a target binary, as they require source code. Our evaluation shows TraceBin finds: (i) errors consistent with existing studies even without source; (ii) disassembly errors due to control flow; (iii) new interesting errors; (iv) errors in non-C/C++ binaries; (v) errors in closed-source binaries; and (vi) show that disassembly errors can have significant security implications. Overall, our experimental results show that TraceBin finds many errors in existing popular disassemblers. It is also helpful in automated security tasks on (closed source) binaries relying on disassemblers.

Related papers

Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z)
Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation [12.983487033256448]
Decompile-Bench is the first open-source dataset comprising two million binary-source function pairs condensed from 100 million collected function pairs.<n>For the evaluation purposes, we developed a benchmark Decompile-Bench-Eval including manually crafted binaries from the well-established HumanEval and MBPP.<n>We find that fine-tuning with Decompile-Bench causes a 20% improvement over previous benchmarks in terms of the re-executability rate.
arXiv Detail & Related papers (2025-05-19T03:34:33Z)
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding [50.17907898478795]
This work proposes a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in real-world reverse engineering scenarios.<n>Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2025-04-30T17:02:06Z)
Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code [55.493408628371235]
We propose ByteTR, a framework for recovering variable types in binary code.<n>In light of the ubiquity of variable propagation across functions, ByteTR conducts inter-procedural analysis to trace variable propagation and employs a gated graph neural network to capture long-range data flow dependencies for variable type recovery.
arXiv Detail & Related papers (2025-03-10T12:27:05Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries [2.696054049278301]
We introduce DeBinVul, a novel decompiled binary code vulnerability dataset. We fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in detecting binary code vulnerabilities.
arXiv Detail & Related papers (2024-11-07T18:54:31Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [51.898805184427545]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.<n>We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.<n>We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning [19.22004583230725]
We propose BinGo, a new security patch detection system for binary code. BinGo consists of four phases, namely, patch data pre-processing, graph extraction, embedding generation, and graph representation learning. Our experimental results show BinGo can achieve up to 80.77% accuracy in identifying security patches between two neighboring versions of binary code.
arXiv Detail & Related papers (2023-12-13T06:35:39Z)
Automatically Mitigating Vulnerabilities in Binary Programs via Partially Recompilable Decompilation [8.31538179550799]
We propose Partially Recompilable Decompilation (PRD) PRD lifts suspect binary functions to source, available for analysis, revision, or review, and creates a patched binary. We evaluate PRD in two contexts: a fully automated process incorporating source-level Automated Program Repair (APR) methods; human-edited source-level repairs.
arXiv Detail & Related papers (2022-02-24T19:48:45Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.