Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
- URL: http://arxiv.org/abs/2512.21681v1
- Date: Thu, 25 Dec 2025 13:53:46 GMT
- Title: Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
- Authors: Tian Li, Bo Lin, Shangwen Wang, Yusong Tan,
- Abstract summary: Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development.<n>This paper conducts the first systematic exploration of a critical and stealthy threat: backdoor attacks targeting the retriever component.
- Score: 17.62321354201344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security implications remain dangerously underexplored. This paper conducts the first systematic exploration of a critical and stealthy threat: backdoor attacks targeting the retriever component, which represents a significant supply-chain vulnerability. It is infeasible to assess this threat realistically, as existing attack methods are either too ineffective to pose a real danger or are easily detected by state-of-the-art defense mechanisms spanning both latent-space analysis and token-level inspection, which achieve consistently high detection rates. To overcome this barrier and enable a realistic analysis, we first developed VenomRACG, a new class of potent and stealthy attack that serves as a vehicle for our investigation. Its design makes poisoned samples statistically indistinguishable from benign code, allowing the attack to consistently maintain low detectability across all evaluated defense mechanisms. Armed with this capability, our exploration reveals a severe vulnerability: by injecting vulnerable code equivalent to only 0.05% of the entire knowledge base size, an attacker can successfully manipulate the backdoored retriever to rank the vulnerable code in its top-5 results in 51.29% of cases. This translates to severe downstream harm, causing models like GPT-4o to generate vulnerable code in over 40% of targeted scenarios, while leaving the system's general performance intact. Our findings establish that retriever backdooring is not a theoretical concern but a practical threat to the software development ecosystem that current defenses are blind to, highlighting the urgent need for robust security measures.
Related papers
- Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective [19.40077533912822]
This paper investigates a more pervasive and insidious threat: textitbackdoor vulnerabilities from low-concentration poisoned data distributed across datasets of benign clients.<n>Our findings highlight an urgent need for new defense mechanisms tailored to the realities of modern, decentralized data ecosystems.
arXiv Detail & Related papers (2026-02-17T15:54:45Z) - State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space [42.234025453061875]
Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics.<n>We introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger.<n>Our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.
arXiv Detail & Related papers (2026-01-07T08:54:31Z) - Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development [11.76638109321532]
We propose an architecture of a multi-agent system for the implementation phase of the software engineering process.<n>We demonstrate that while such systems can generate code quite accurately, they are vulnerable to attacks, including code injection.
arXiv Detail & Related papers (2025-12-26T01:08:43Z) - Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation [13.36343806244795]
We introduce a new kind of backdoor attacks, dubbed Semantically-Equivalent Transformation (SET)-based backdoor attacks.<n>We show that SET-based attacks achieve high success rates (often >90%) while preserving model utility.<n>The attack proves highly stealthy, evading state-of-the-art defenses with detection rates on average over 25.13% lower than injection-based counterparts.
arXiv Detail & Related papers (2025-12-22T09:54:52Z) - Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z) - DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models [50.21378052667732]
We conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics.<n>We propose DiffuGuard, a training-free defense framework that addresses vulnerabilities through a dual-stage approach.
arXiv Detail & Related papers (2025-09-29T05:17:10Z) - EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection [19.885698402507145]
Adversarial examples can exploit vulnerabilities within deep neural networks.
This study showcases the susceptibility of deep learning models to adversarial attacks, which can achieve 100% attack success rate.
arXiv Detail & Related papers (2024-07-27T09:04:54Z) - Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective [53.24281798458074]
Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication.
Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning)
arXiv Detail & Related papers (2024-05-21T13:34:23Z) - Double Backdoored: Converting Code Large Language Model Backdoors to Traditional Malware via Adversarial Instruction Tuning Attacks [15.531860128240385]
This work investigates novel techniques for transitioning backdoors from the AI/ML domain to traditional computer malware.<n>We present MalInstructCoder, a framework designed to assess the cybersecurity vulnerabilities of instruction-tuned Code LLMs.<n>We conduct a comprehensive investigation into the exploitability of the code-specific instruction tuning process involving three state-of-the-art Code LLMs.
arXiv Detail & Related papers (2024-04-29T10:14:58Z) - A Zero Trust Framework for Realization and Defense Against Generative AI
Attacks in Power Grid [62.91192307098067]
This paper proposes a novel zero trust framework for a power grid supply chain (PGSC)
It facilitates early detection of potential GenAI-driven attack vectors, assessment of tail risk-based stability measures, and mitigation of such threats.
Experimental results show that the proposed zero trust framework achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack.
arXiv Detail & Related papers (2024-03-11T02:47:21Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - RobustSense: Defending Adversarial Attack for Secure Device-Free Human
Activity Recognition [37.387265457439476]
We propose a novel learning framework, RobustSense, to defend common adversarial attacks.
Our method works well on wireless human activity recognition and person identification systems.
arXiv Detail & Related papers (2022-04-04T15:06:03Z) - Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary.
In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback.
We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.