Related papers: POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models

POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models

URL: http://arxiv.org/abs/2505.06579v1
Date: Sat, 10 May 2025 09:36:28 GMT
Title: POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models
Authors: Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, Junzheng Shi,
Abstract summary: Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
Score: 4.620537391830117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved remarkable success in various domains, primarily due to their strong capabilities in reasoning and generating human-like text. Despite their impressive performance, LLMs are susceptible to hallucinations, which can lead to incorrect or misleading outputs. This is primarily due to the lack of up-to-date knowledge or domain-specific information. Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources. However, the security of RAG systems has not been thoroughly studied. In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites. Compared to existing poisoning attacks on RAG systems, our attack is more practical as it does not require access to the target user query's info or edit the user query. It not only ensures that injected texts can be retrieved by the model, but also ensures that the LLM will be misled to refer to the injected texts in its response. We demonstrate the effectiveness of POISONCRAFTacross different datasets, retrievers, and language models in RAG pipelines, and show that it remains effective when transferred across retrievers, including black-box systems. Moreover, we present a case study revealing how the attack influences both the retrieval behavior and the step-by-step reasoning trace within the generation model, and further evaluate the robustness of POISONCRAFTunder multiple defense mechanisms. These results validate the practicality of our threat model and highlight a critical security risk for RAG systems deployed in real-world applications. We release our code\footnote{https://github.com/AndyShaw01/PoisonCraft} to support future research on the security and robustness of RAG systems in real-world settings.

Related papers

Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation [9.625480143413405]
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs)<n>One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question.<n>We propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack.
arXiv Detail & Related papers (2025-08-04T19:03:52Z)
Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs [17.364495894862902]
In Large Language Models, Retrieval-Augmented Generation (RAG) systems can significantly enhance the performance of large language models by integrating external knowledge.<n>Existing research focuses mainly on how poisoning attacks in RAG systems affect model output quality, overlooking their potential to amplify model biases.<n>This paper proposes a Bias Retrieval and Reward Attack (BRRA) framework, which systematically investigates attack pathways that amplify language model biases.
arXiv Detail & Related papers (2025-06-13T02:28:46Z)
Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems [39.05753852489526]
Existing adversarial attack methods typically exploit knowledge base poisoning to probe the vulnerabilities of RAG systems.<n>This paper uses reasoning process templates from R1-based RAG systems to wrap erroneous knowledge into adversarial documents, and injects them into the knowledge base to attack RAG systems.<n>The key idea of our approach is that adversarial documents, by simulating the chain-of-thought patterns aligned with the model's training signals, may be misinterpreted by the model as authentic historical reasoning processes.
arXiv Detail & Related papers (2025-05-22T08:22:46Z)
Traceback of Poisoning Attacks to Retrieval-Augmented Generation [10.19539347377776]
Research has revealed RAG's susceptibility to poisoning attacks, where the attacker injects poisoned texts into the knowledge database.<n>Existing defenses, which predominantly focus on inference-time mitigation, have proven insufficient against sophisticated attacks.<n>We introduce RAGForensics, the first traceback system for RAG, designed to identify poisoned texts within the knowledge database that are responsible for the attacks.
arXiv Detail & Related papers (2025-04-30T14:10:02Z)
Practical Poisoning Attacks against Retrieval-Augmented Generation [9.320227105592917]
Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge.<n>Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues.<n>We propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text.
arXiv Detail & Related papers (2025-04-04T21:49:42Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs)<n>In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)<n>RAG enhances MLLMs by grounding responses in query-relevant external knowledge.<n>This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations.<n>RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into retrieval databases can mislead the model into generating factually incorrect outputs.<n>This paper investigates both the retrieval and the generation components of RAG systems to understand how to enhance their robustness against such attacks.
arXiv Detail & Related papers (2024-12-21T17:31:52Z)
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis [3.706288937295861]
RevPRAG is a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection.<n>Our results on multiple benchmark datasets and RAG architectures show our approach could achieve 98% true positive rate, while maintaining false positive rates close to 1%.
arXiv Detail & Related papers (2024-11-28T06:29:46Z)
Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models [0.0]
Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval. This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation. We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
arXiv Detail & Related papers (2024-10-18T14:02:34Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data. In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z)
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models [45.409248316497674]
Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG.
arXiv Detail & Related papers (2024-02-12T18:28:36Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.