Related papers: Disassembling Obfuscated Executables with LLM

Disassembling Obfuscated Executables with LLM

URL: http://arxiv.org/abs/2407.08924v1
Date: Fri, 12 Jul 2024 02:10:07 GMT
Title: Disassembling Obfuscated Executables with LLM
Authors: Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen, Shengchen Duan, Shen Wang,
Abstract summary: We present DisasLLM, a novel dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated executables, which is shown to significantly outperform other state-of-the-art disassembly solutions.
Score: 9.897996716496982
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possible by the emergence of large language models (LLMs). In this paper, we present DisasLLM, a novel LLM-driven dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated executables, which is shown to significantly outperform other state-of-the-art disassembly solutions.

Related papers

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction [68.6543680065379]
Large language models (LLMs) are vulnerable to prompt injection attacks. We propose a novel defense method that leverages, rather than suppresses, the instruction-following abilities of LLMs.
arXiv Detail & Related papers (2025-04-29T07:13:53Z)
DESIL: Detecting Silent Bugs in MLIR Compiler Infrastructure [8.760618981915016]
DESIL enables silent bug detection by defining a set of UB-elimination rules based on the MLIR documentation. DESIL incorporates the differential testing for silent bug detection. It detected 23 silent bugs and 19 crash bugs, of which 12/14 have been confirmed or fixed.
arXiv Detail & Related papers (2025-04-02T05:48:51Z)
ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning [30.433542630784956]
Assembly Augmented Tuning (ASMA-Tune) is an end-to-end structural-semantic instruction-tuning framework. Our approach synergizes encoder architectures with decoder-based LLMs through projector modules to enable comprehensive code understanding.
arXiv Detail & Related papers (2025-03-14T17:36:08Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation [36.12009987721901]
Malware authors often employ code obfuscations to make their malware harder to detect. Existing tools for generating obfuscated code often require access to the original source code. Can Large Language Models potentially generate a new obfuscated assembly code? If so, this poses a risk to anti-virus engines and potentially increases the flexibility of attackers to create new obfuscation patterns.
arXiv Detail & Related papers (2024-12-20T18:31:24Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization. Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z)
Enabling Memory Safety of C Programs using LLMs [5.297072277460838]
Memory safety violations in low-level code, written in languages like C, continue to remain one of the major sources of software vulnerabilities. One method of removing such violations by construction is to port C code to a safe C dialect. Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead. This porting is a manual process that imposes significant burden on the programmer and hence, there has been limited adoption of this technique.
arXiv Detail & Related papers (2024-04-01T13:05:54Z)
DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers [74.7446827091938]
We introduce an automatic prompt textbfDecomposition and textbfReconstruction framework for jailbreak textbfAttack (DrAttack) DrAttack includes three key components: (a) Decomposition' of the original prompt into sub-prompts, (b) Reconstruction' of these sub-prompts implicitly by in-context learning with semantically similar but harmless reassembling demo, and (c) a Synonym Search' of sub-prompts, aiming to find sub-prompts' synonyms that maintain the original intent while
arXiv Detail & Related papers (2024-02-25T17:43:29Z)
Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning [18.521914733617482]
This work proposes a hierarchical attention mechanism that builds attention summaries to capture the semantics more effectively. equipped with these techniques, this work develops Nova, a generative LLM for assembly code. Nova outperforms existing techniques on binary code decompilation by up to 14.84 -- 21.58% (absolute percentage point improvement) higher Pass@1 and Pass@10, and outperforms the latest binary code similarity detection techniques by up to 6.17% Recall@1.
arXiv Detail & Related papers (2023-11-22T22:27:54Z)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions. We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z)
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation [52.81694565226513]
Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, for long and consistent output generation.
arXiv Detail & Related papers (2023-10-02T16:55:19Z)
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [81.9962823875981]
We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflex.
arXiv Detail & Related papers (2023-05-27T07:04:15Z)
SimCLF: A Simple Contrastive Learning Framework for Function-level Binary Embeddings [2.1222884030559315]
We propose SimCLF: A Simple Contrastive Learning Framework for Function-level Binary Embeddings. We take an unsupervised learning approach and formulate binary code similarity detection as instance discrimination. SimCLF directly operates on disassembled binary functions and could be implemented with any encoder.
arXiv Detail & Related papers (2022-09-06T12:09:45Z)
XDA: Accurate, Robust Disassembly with Transfer Learning [23.716121748941138]
XDA is a transfer-learning-based disassembly framework. It learns different contextual dependencies present in machine code. It is up to 38x faster than hand-written disassemblers like IDA Pro.
arXiv Detail & Related papers (2020-10-02T04:14:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.