Related papers: LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models

LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models

URL: http://arxiv.org/abs/2507.09411v2
Date: Sat, 04 Oct 2025 01:31:46 GMT
Title: LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models
Authors: Md Ajwad Akil, Adrian Shuai Li, Imtiaz Karim, Arun Iyengar, Ashish Kundu, Vinny Parla, Elisa Bertino,
Abstract summary: Large Language Models (LLMs) have transformed software development and automated code generation.<n>We introduce LLMalMorph, a framework that leverages semantical and syntactical code comprehension to generate new malware variants.<n>Our experiments demonstrate that LLMalMorph variants can effectively evade antivirus engines, achieving typical detection rate reductions of 10-15%.
Score: 10.777856888013469
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our experiments demonstrate that LLMalMorph variants can effectively evade antivirus engines, achieving typical detection rate reductions of 10-15% across multiple complex samples. Furthermore, without explicitly targeting learning-based detectors, LLMalMorph attained attack success rates of up to 91% against a Machine Learning (ML) based malware detector. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.

Related papers

A Decompilation-Driven Framework for Malware Detection with Large Language Models [0.0]
This paper evaluates the efficacy of Large Language Models (LLMs) in classifying executable code as either benign or malicious.<n>We introduce an automated pipeline that first decompiles Windows executable into a C code using Ghidra disassembler.<n>We demonstrate that a fine-tuned model, trained on curated malware and benign datasets, significantly outperforms its vanilla counterpart.
arXiv Detail & Related papers (2026-01-14T00:00:26Z)
"Digital Camouflage": The LLVM Challenge in LLM-Based Malware Detection [0.0]
Large Language Models (LLMs) have emerged as promising tools for malware detection.<n>However, their reliability under adversarial compiler-level obfuscation is yet to be discovered.<n>This study empirically evaluate the robustness of three state-of-the-art LLMs against compiler-level obfuscation techniques.
arXiv Detail & Related papers (2025-09-20T12:47:36Z)
IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation [22.29476520010842]
Large language models (LLMs) have democratized software development, reducing the expertise barrier for programming complex applications.<n>This accessibility extends to malicious software development, raising significant security concerns.<n>In this paper, we introduce the Malware Generation Compiler (MGC), a novel framework that leverages this vulnerability through modular decomposition and alignment-evasive generation.
arXiv Detail & Related papers (2025-07-02T18:00:49Z)
Large Language Model Unlearning for Source Code [65.42425213605114]
PROD is a novel unlearning approach that enables LLMs to forget undesired code content while preserving their code generation capabilities.<n>Our evaluation demonstrates that PROD achieves superior balance between forget quality and model utility compared to existing unlearning approaches.
arXiv Detail & Related papers (2025-06-20T16:27:59Z)
Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
On Benchmarking Code LLMs for Android Malware Analysis [13.932151152280689]
Large Language Models (LLMs) have demonstrated strong capabilities in various code intelligence tasks.<n>This paper presents CAMA, a benchmarking framework designed to evaluate the effectiveness of Code LLMs in Android malware analysis.
arXiv Detail & Related papers (2025-04-01T12:05:49Z)
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding [60.37988508851391]
Language models (LMs) have become a staple of the code-writing toolbox.<n>Research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse.<n>In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency.
arXiv Detail & Related papers (2025-03-27T23:08:53Z)
Metamorphic Malware Evolution: The Potential and Peril of Large Language Models [1.0878040851638]
We introduce a framework for creating self-testing program mutation engines based on LLM/Transformer-based models. The proposed framework serves as an essential tool in testing next-gen metamorphic malware detection engines.
arXiv Detail & Related papers (2024-10-31T12:53:56Z)
MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
Large language models (LLMs) have demonstrated remarkable capabilities. Their powerful generative abilities enable flexible responses based on various queries or instructions. This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
arXiv Detail & Related papers (2024-08-20T10:44:29Z)
Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle. This paper proposes a novel solution called metamorphic prompt testing to address these challenges. Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z)
DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing. We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z)
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the original code and its LLM-rewritten variants.<n>Our results demonstrate a significant improvement over existing SOTA synthetic content detectors.
arXiv Detail & Related papers (2024-05-25T08:57:28Z)
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification [73.66920648926161]
We introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification.<n>We present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations.<n>We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations.
arXiv Detail & Related papers (2024-04-30T23:56:38Z)
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns [7.776434991976473]
This paper studies the deobfuscation capabilities of large language models (LLMs) We evaluate four LLMs with real-world malicious scripts used in the notorious Emotet malware campaign. Our results indicate that while not absolutely accurate yet, some LLMs can efficiently deobfuscate such payloads.
arXiv Detail & Related papers (2024-04-30T17:06:27Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.<n>CodeIP is a novel multi-bit watermarking technique that inserts additional information to preserve provenance details.<n>Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z)
Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery [23.294653273180472]
We show how a malicious actor trains a surrogate model to discover binary mutations that cause an instance to be misclassified. Then, mutated malware is sent to the victim model that takes the place of an antivirus API to test whether it can evade detection.
arXiv Detail & Related papers (2021-06-15T03:31:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.