Related papers: Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation

Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation

URL: http://arxiv.org/abs/2508.02835v1
Date: Mon, 04 Aug 2025 19:03:52 GMT
Title: Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation
Authors: Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim,
Abstract summary: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs)<n>One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question.<n>We propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack.
Score: 9.625480143413405
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs) by incorporating external, up-to-date knowledge sources. However, this introduces a potential vulnerability to knowledge poisoning attacks, where attackers can compromise the knowledge source to mislead the generation model. One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question. In this work, we propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack. First, we propose a new property to uncover distinct properties to differentiate between adversarial and clean texts in the knowledge data source. Next, we employ this property to filter out adversarial texts from clean ones in the design of our proposed approaches. Evaluation of these methods using benchmark datasets demonstrate their effectiveness, with performances close to those of the original RAG systems.

Related papers

The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z)
Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems [39.05753852489526]
Existing adversarial attack methods typically exploit knowledge base poisoning to probe the vulnerabilities of RAG systems.<n>This paper uses reasoning process templates from R1-based RAG systems to wrap erroneous knowledge into adversarial documents, and injects them into the knowledge base to attack RAG systems.<n>The key idea of our approach is that adversarial documents, by simulating the chain-of-thought patterns aligned with the model's training signals, may be misinterpreted by the model as authentic historical reasoning processes.
arXiv Detail & Related papers (2025-05-22T08:22:46Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
Traceback of Poisoning Attacks to Retrieval-Augmented Generation [10.19539347377776]
Research has revealed RAG's susceptibility to poisoning attacks, where the attacker injects poisoned texts into the knowledge database.<n>Existing defenses, which predominantly focus on inference-time mitigation, have proven insufficient against sophisticated attacks.<n>We introduce RAGForensics, the first traceback system for RAG, designed to identify poisoned texts within the knowledge database that are responsible for the attacks.
arXiv Detail & Related papers (2025-04-30T14:10:02Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs)<n>In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)<n>RAG enhances MLLMs by grounding responses in query-relevant external knowledge.<n>This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models [45.409248316497674]
Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG.
arXiv Detail & Related papers (2024-02-12T18:28:36Z)
Targeted Attack Improves Protection against Unauthorized Diffusion Customization [3.1678356835951273]
Diffusion models build a new milestone for image generation yet raising public concerns. They can be fine-tuned on unauthorized images for customization. Current protection, leveraging untargeted attacks, does not appear to be effective enough. We propose a simple yet effective improvement for the protection against unauthorized diffusion customization by introducing targeted attacks.
arXiv Detail & Related papers (2023-10-07T05:24:42Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples. AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class. We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.