Related papers: RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

URL: http://arxiv.org/abs/2510.11195v1
Date: Mon, 13 Oct 2025 09:27:26 GMT
Title: RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
Authors: Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli,
Abstract summary: We develop a new class of black-box attack, RAG-Pull, that inserts hidden characters into queries or external code repositories.<n>We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success.<n>RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.
Score: 3.676794958453962
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.

Related papers

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG [16.528679832019854]
TabooRAG is a transferable blocking attack framework operating under a strict black-box setting.<n>We show that TabooRAG achieves stable cross-model transferability and state-of-the-art blocking success rates, reaching up to 96% on GPT-5.2.
arXiv Detail & Related papers (2026-03-04T10:27:09Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning [14.419943772894754]
Retrieval-Augmented Generation (RAG) has become a standard approach for improving the reliability of large language models (LLMs)<n>This paper uncovers that such attacks could be mitigated by the strong textitself-correction ability (SCA) of modern LLMs.<n>We introduce textscDisarmRAG, a new poisoning paradigm that compromises the retriever itself to suppress the SCA and enforce attacker-chosen outputs.
arXiv Detail & Related papers (2025-08-27T17:49:28Z)
Robust Anti-Backdoor Instruction Tuning in LVLMs [53.766434746801366]
We introduce a lightweight, certified-agnostic defense framework for large visual language models (LVLMs)<n>Our framework finetunes only adapter modules and text embedding layers under instruction tuning.<n>Experiments against seven attacks on Flickr30k and MSCOCO demonstrate that ours reduces their attack success rate to nearly zero.
arXiv Detail & Related papers (2025-06-04T01:23:35Z)
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs [54.90315421117162]
We propose a novel poisoning method via completely harmless data.<n>Inspired by the causal reasoning in auto-regressive LLMs, we aim to establish robust associations between triggers and an affirmative response prefix.<n>We observe an interesting resistance phenomenon where the LLM initially appears to agree but subsequently refuses to answer.
arXiv Detail & Related papers (2025-05-23T08:13:59Z)
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs [6.517076600304129]
Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements.<n>We propose an approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline.<n>RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired by the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector.
arXiv Detail & Related papers (2024-11-27T10:48:37Z)
ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs [56.46702494338318]
This paper introduces a new attack paradigm: (automatic) external prompt injection against code-oriented large language models.<n>We propose ShadowCode, a simple yet effective method that automatically generates induced perturbations based on code simulation.<n>We evaluate our method across 13 distinct malicious objectives, generating 31 threat cases spanning three popular programming languages.
arXiv Detail & Related papers (2024-07-12T10:59:32Z)
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection [17.948513691133037]
We introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures.
arXiv Detail & Related papers (2024-06-10T22:10:05Z)
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models [18.107026036897132]
Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data. Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web.
arXiv Detail & Related papers (2024-06-03T02:25:33Z)
Phantom: General Backdoor Attacks on Retrieval Augmented Language Generation [44.74112207662136]
Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs)<n>We propose a novel attack that allows an adversary to inject a single malicious document into a RAG system's knowledge base, and mount a backdoor poisoning attack.
arXiv Detail & Related papers (2024-05-30T21:19:24Z)
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data. In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z)
On Prompt-Driven Safeguarding for Large Language Models [172.13943777203377]
We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction. Inspired by these findings, we propose a method for safety prompt optimization, namely DRO. Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness.
arXiv Detail & Related papers (2024-01-31T17:28:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.