Disa: Accurate Learning-based Static Disassembly with Attentions
- URL: http://arxiv.org/abs/2507.07246v1
- Date: Wed, 09 Jul 2025 19:36:57 GMT
- Title: Disa: Accurate Learning-based Static Disassembly with Attentions
- Authors: Peicheng Wang, Monika Santra, Mingyu Liu, Cong Sun, Dongrui Zeng, Gang Tan,
- Abstract summary: Disa is a new learning-based disassembly approach that uses the information of superset instructions over the multi-head self-attention to learn the instructions' correlations.<n>Disa outperforms prior deep-learning disassembly approaches in function entry-point identification.
- Score: 19.40730097748233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For reverse engineering related security domains, such as vulnerability detection, malware analysis, and binary hardening, disassembly is crucial yet challenging. The fundamental challenge of disassembly is to identify instruction and function boundaries. Classic approaches rely on file-format assumptions and architecture-specific heuristics to guess the boundaries, resulting in incomplete and incorrect disassembly, especially when the binary is obfuscated. Recent advancements of disassembly have demonstrated that deep learning can improve both the accuracy and efficiency of disassembly. In this paper, we propose Disa, a new learning-based disassembly approach that uses the information of superset instructions over the multi-head self-attention to learn the instructions' correlations, thus being able to infer function entry-points and instruction boundaries. Disa can further identify instructions relevant to memory block boundaries to facilitate an advanced block-memory model based value-set analysis for an accurate control flow graph (CFG) generation. Our experiments show that Disa outperforms prior deep-learning disassembly approaches in function entry-point identification, especially achieving 9.1% and 13.2% F1-score improvement on binaries respectively obfuscated by the disassembly desynchronization technique and popular source-level obfuscator. By achieving an 18.5% improvement in the memory block precision, Disa generates more accurate CFGs with a 4.4% reduction in Average Indirect Call Targets (AICT) compared with the state-of-the-art heuristic-based approach.
Related papers
- Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks [13.11143749397866]
NeuCall is a novel approach for resolving indirect calls using graph neural networks.<n>We leverage advanced compiler-level type analysis to generate high-quality callsite-callee training pairs.<n>NeuCall achieves an F1 score of 95.2%, outperforming state-of-the-art ML-based approaches.
arXiv Detail & Related papers (2025-07-24T20:54:41Z) - Explainable Vulnerability Detection in C/C++ Using Edge-Aware Graph Attention Networks [0.2499907423888049]
This paper presents ExplainVulD, a graph-based framework for vulnerability detection in C/C++ code.<n>It achieves a mean accuracy of 88.25 percent and an F1 score of 48.23 percent across 30 independent runs on the ReVeal dataset.
arXiv Detail & Related papers (2025-07-22T12:49:14Z) - LeakGuard: Detecting Memory Leaks Accurately and Scalably [3.256598917442277]
LeakGuard is a memory leak detection tool which provides satisfactory balance of accuracy and scalability.<n>For accuracy, LeakGuard analyzes the behaviors of library and developer-defined memory allocation and deallocation functions.<n>For scalability, LeakGuard examines each function of interest independently by using its function summary and under-constrained symbolic execution technique.
arXiv Detail & Related papers (2025-04-06T09:11:37Z) - Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.<n>Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.<n> Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [51.898805184427545]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.<n>We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.<n>We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Enhancing Infrared Small Target Detection Robustness with Bi-Level
Adversarial Framework [61.34862133870934]
We propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions.
Our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark.
arXiv Detail & Related papers (2023-09-03T06:35:07Z) - NeuDep: Neural Binary Memory Dependence Analysis [28.33030658966508]
We present a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute.
We implement our approach in NeuDep and evaluate it on 41 popular software projects compiled by 2 compilers, 4 optimizations, and 4 obfuscation passes.
arXiv Detail & Related papers (2022-10-04T04:59:36Z) - Statement-Level Vulnerability Detection: Learning Vulnerability Patterns Through Information Theory and Contrastive Learning [31.15123852246431]
We propose a novel end-to-end deep learning-based approach to identify the vulnerability-relevant code statements of a specific function.
Inspired by the structures observed in real-world vulnerable code, we first leverage mutual information for learning a set of latent variables.
We then propose novel clustered spatial contrastive learning in order to further improve the representation learning.
arXiv Detail & Related papers (2022-09-20T00:46:20Z) - SimCLF: A Simple Contrastive Learning Framework for Function-level
Binary Embeddings [2.1222884030559315]
We propose SimCLF: A Simple Contrastive Learning Framework for Function-level Binary Embeddings.
We take an unsupervised learning approach and formulate binary code similarity detection as instance discrimination.
SimCLF directly operates on disassembled binary functions and could be implemented with any encoder.
arXiv Detail & Related papers (2022-09-06T12:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.