Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation
- URL: http://arxiv.org/abs/2508.18148v1
- Date: Mon, 25 Aug 2025 15:55:17 GMT
- Title: Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation
- Authors: Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie,
- Abstract summary: Intrusion Detection Systems (IDS) play a crucial role in network security defense.<n>A significant challenge for IDS in training detection models is the shortage of adequately labeled malicious samples.<n>This paper introduces a novel semi-supervised framework textbfGANGRL-LLM, which integrates Generative Adrial Networks (GANs) with Large Language Models (LLMs)
- Score: 47.76793060077816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Intrusion Detection Systems (IDS) play a crucial role in network security defense. However, a significant challenge for IDS in training detection models is the shortage of adequately labeled malicious samples. To address these issues, this paper introduces a novel semi-supervised framework \textbf{GANGRL-LLM}, which integrates Generative Adversarial Networks (GANs) with Large Language Models (LLMs) to enhance malicious code generation and SQL Injection (SQLi) detection capabilities in few-sample learning scenarios. Specifically, our framework adopts a collaborative training paradigm where: (1) the GAN-based discriminator improves malicious pattern recognition through adversarial learning with generated samples and limited real samples; and (2) the LLM-based generator refines the quality of malicious code synthesis using reward signals from the discriminator. The experimental results demonstrate that even with a limited number of labeled samples, our training framework is highly effective in enhancing both malicious code generation and detection capabilities. This dual enhancement capability offers a promising solution for developing adaptive defense systems capable of countering evolving cyber threats.
Related papers
- Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - Rethinking On-Device LLM Reasoning: Why Analogical Mapping Outperforms Abstract Thinking for IoT DDoS Detection [8.874341026403854]
This paper introduces a novel detection framework that integrates Chain-of-Thought (CoT) reasoning with Retrieval-Augmented Generation (RAG)<n>We systematically evaluate compact ODLLMs, including LLaMA 3.2 (1B, 3B) and Gemma 3 (1B, 4B), using structured prompting and exemplar-driven reasoning strategies.<n> Experimental results demonstrate substantial performance improvements with few-shot prompting, achieving macro-average F1 scores as high as 0.85.
arXiv Detail & Related papers (2026-01-20T15:18:56Z) - Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation [69.8237598448941]
This study investigates the potential of ensemble learning to enhance the performance of Large Language Models (LLMs) in source code vulnerability detection.<n>We propose Dynamic Gated Stacking (DGS), a Stacking variant tailored for vulnerability detection.
arXiv Detail & Related papers (2025-09-16T03:48:22Z) - GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation [17.36458017234638]
GUARD is a novel dual-agent defense framework designed to counter CoT backdoor attacks in neural code generation.<n>GUARD integrates two core components: GUARD-Judge, which identifies suspicious CoT steps and potential triggers through comprehensive analysis, and GUARD-Repair, which employs a retrieval-augmented generation approach to regenerate secure CoT steps.
arXiv Detail & Related papers (2025-05-27T16:55:46Z) - Learning in Multiple Spaces: Few-Shot Network Attack Detection with Metric-Fused Prototypical Networks [47.18575262588692]
We propose a novel Multi-Space Prototypical Learning framework tailored for few-shot attack detection.<n>By leveraging Polyak-averaged prototype generation, the framework stabilizes the learning process and effectively adapts to rare and zero-day attacks.<n> Experimental results on benchmark datasets demonstrate that MSPL outperforms traditional approaches in detecting low-profile and novel attack types.
arXiv Detail & Related papers (2024-12-28T00:09:46Z) - Comprehensive Botnet Detection by Mitigating Adversarial Attacks, Navigating the Subtleties of Perturbation Distances and Fortifying Predictions with Conformal Layers [1.6001193161043425]
Botnets are computer networks controlled by malicious actors that present significant cybersecurity challenges.
This research addresses the sophisticated adversarial manipulations posed by attackers, aiming to undermine machine learning-based botnet detection systems.
We introduce a flow-based detection approach, leveraging machine learning and deep learning algorithms trained on the ISCX and ISOT datasets.
arXiv Detail & Related papers (2024-09-01T08:53:21Z) - Multi-agent Reinforcement Learning-based Network Intrusion Detection System [3.4636217357968904]
Intrusion Detection Systems (IDS) play a crucial role in ensuring the security of computer networks.
We propose a novel multi-agent reinforcement learning (RL) architecture, enabling automatic, efficient, and robust network intrusion detection.
Our solution introduces a resilient architecture designed to accommodate the addition of new attacks and effectively adapt to changes in existing attack patterns.
arXiv Detail & Related papers (2024-07-08T09:18:59Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - Generative Adversarial Network-Driven Detection of Adversarial Tasks in
Mobile Crowdsensing [5.675436513661266]
Crowdsensing systems are vulnerable to various attacks as they build on non-dedicated and ubiquitous properties.
Previous works suggest that GAN-based attacks exhibit more crucial devastation than empirically designed attack samples.
This paper aims to detect intelligently designed illegitimate sensing service requests by integrating a GAN-based model.
arXiv Detail & Related papers (2022-02-16T00:23:25Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.