Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation
- URL: http://arxiv.org/abs/2412.16135v3
- Date: Wed, 29 Jan 2025 13:52:31 GMT
- Title: Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation
- Authors: Seyedreza Mohseni, Seyedali Mohammadi, Deepa Tilwani, Yash Saxena, Gerald Ketu Ndawula, Sriram Vema, Edward Raff, Manas Gaur,
- Abstract summary: Malware authors often employ code obfuscations to make their malware harder to detect.
Existing tools for generating obfuscated code often require access to the original source code.
Can Large Language Models potentially generate a new obfuscated assembly code?
If so, this poses a risk to anti-virus engines and potentially increases the flexibility of attackers to create new obfuscation patterns.
- Score: 36.12009987721901
- License:
- Abstract: Malware authors often employ code obfuscations to make their malware harder to detect. Existing tools for generating obfuscated code often require access to the original source code (e.g., C++ or Java), and adding new obfuscations is a non-trivial, labor-intensive process. In this study, we ask the following question: Can Large Language Models (LLMs) potentially generate a new obfuscated assembly code? If so, this poses a risk to anti-virus engines and potentially increases the flexibility of attackers to create new obfuscation patterns. We answer this in the affirmative by developing the MetamorphASM benchmark comprising MetamorphASM Dataset (MAD) along with three code obfuscation techniques: dead code, register substitution, and control flow change. The MetamorphASM systematically evaluates the ability of LLMs to generate and analyze obfuscated code using MAD, which contains 328,200 obfuscated assembly code samples. We release this dataset and analyze the success rate of various LLMs (e.g., GPT-3.5/4, GPT-4o-mini, Starcoder, CodeGemma, CodeLlama, CodeT5, and LLaMA 3.1) in generating obfuscated assembly code. The evaluation was performed using established information-theoretic metrics and manual human review to ensure correctness and provide the foundation for researchers to study and develop remediations to this risk.
Related papers
- Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar [15.421030528350212]
We build a code-obfuscation based benchmark OBFUSEVAL to evaluate large language models.
We use three-level strategy to obfuscate descriptions, code and context dependencies.
The results show that after obfuscation, the average decrease ratio of test pass rate can up to 62.5%.
arXiv Detail & Related papers (2024-12-11T05:31:39Z) - CodeCipher: Learning to Obfuscate Source Code Against LLMs [5.872773591957006]
We propose CodeCipher, a novel method that perturbs privacy from code while preserving the original response from LLMs.
CodeCipher transforms the LLM's embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code.
Results show that our model successfully confuses the privacy in source code while preserving the original LLM's performance.
arXiv Detail & Related papers (2024-10-08T08:28:54Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns [7.776434991976473]
This paper studies the deobfuscation capabilities of large language models (LLMs)
We evaluate four LLMs with real-world malicious scripts used in the notorious Emotet malware campaign.
Our results indicate that while not absolutely accurate yet, some LLMs can efficiently deobfuscate such payloads.
arXiv Detail & Related papers (2024-04-30T17:06:27Z) - Enabling Memory Safety of C Programs using LLMs [5.297072277460838]
Memory safety violations in low-level code, written in languages like C, continue to remain one of the major sources of software vulnerabilities.
One method of removing such violations by construction is to port C code to a safe C dialect.
Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead.
This porting is a manual process that imposes significant burden on the programmer and hence, there has been limited adoption of this technique.
arXiv Detail & Related papers (2024-04-01T13:05:54Z) - Bugs in Large Language Models Generated Code: An Empirical Study [12.625305075672456]
Large Language Models (LLMs) for code have gained significant attention recently.
Similar to human-written code, LLM-generated code is prone to bugs.
This paper examines a sample of 333 bugs collected from code generated using three leading LLMs.
arXiv Detail & Related papers (2024-03-13T20:12:01Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.