Related papers: PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models

PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models

URL: http://arxiv.org/abs/2402.07867v1
Date: Mon, 12 Feb 2024 18:28:36 GMT
Title: PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
Authors: Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia
Abstract summary: We propose PoisonedRAG, a set of knowledge poisoning attacks to RAG. We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts. Our results show our attacks could achieve 90% attack success rates when injecting 5 poisoned texts for each target question into a database with millions of texts.
Score: 49.606341607616926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate those limitations. In particular, given a question, RAG retrieves relevant knowledge from a knowledge database to augment the input of the LLM. For instance, the retrieved knowledge could be a set of top-k texts that are most semantically similar to the given question when the knowledge database contains millions of texts collected from Wikipedia. As a result, the LLM could utilize the retrieved knowledge as the context to generate an answer for the given question. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. Particularly, we propose PoisonedRAG , a set of knowledge poisoning attacks to RAG, where an attacker could inject a few poisoned texts into the knowledge database such that the LLM generates an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on the RAG, we propose two solutions to solve the optimization problem, respectively. Our results on multiple benchmark datasets and LLMs show our attacks could achieve 90% attack success rates when injecting 5 poisoned texts for each target question into a database with millions of texts. We also evaluate recent defenses and our results show they are insufficient to defend against our attacks, highlighting the need for new defenses.

Related papers

Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation [9.625480143413405]
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs)<n>One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question.<n>We propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack.
arXiv Detail & Related papers (2025-08-04T19:03:52Z)
The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs [8.09404178079053]
Retrieval-Augmented Generation (RAG) integrates Large Language Models (LLMs) with external knowledge bases, improving output quality while introducing new security risks. Existing studies on RAG vulnerabilities typically focus on exploiting the retrieval mechanism to inject erroneous knowledge or malicious texts, inducing incorrect outputs. In this paper, we uncover a novel vulnerability: the safety guardrails of LLMs, while designed for protection, can also be exploited as an attack vector by adversaries. Building on this vulnerability, we propose MutedRAG, a novel denial-of-service attack that reversely leverages the guardrails to undermine the availability of
arXiv Detail & Related papers (2025-04-30T14:18:11Z)
Practical Poisoning Attacks against Retrieval-Augmented Generation [9.320227105592917]
Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. We propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text.
arXiv Detail & Related papers (2025-04-04T21:49:42Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG) RAG enhances MLLMs by grounding responses in query-relevant external knowledge. This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks. We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework. It reformulates harmful queries into benign reasoning tasks. We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [19.41533176888415]
Retrieval-Augmented Generation (RAG) addresses hallucination and real-time constraints by dynamically retrieving relevant information from a knowledge database. In this paper, we unveil a more realistic and threatening scenario: opinion manipulation for controversial topics against RAG. We propose a novel RAG black-box attack method, termed FlipedRAG, which is transfer-based.
arXiv Detail & Related papers (2025-01-06T12:24:57Z)
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models [18.301965456681764]
We reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG) HijackRAG enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. We propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge.
arXiv Detail & Related papers (2024-10-30T09:15:51Z)
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks [12.061098193438022]
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This paper investigates the security of RAG systems against end-to-end indirect prompt manipulations.
arXiv Detail & Related papers (2024-08-09T12:26:05Z)
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks [10.36389246679405]
Malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. We conduct an investigation of various poisoning techniques targeting the large language models' fine-tuning phase via the Efficient Fine-Tuning (PEFT) method. Our study presents the first systematic approach to understanding poisoning attacks targeting NLG tasks during fine-tuning via PEFT.
arXiv Detail & Related papers (2024-07-17T03:02:15Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation [30.63258739968483]
Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) We propose new attack vectors that allow an adversary to inject a single malicious document into a RAG system's knowledge base, and mount a backdoor poisoning attack. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama, and show that they transfer to GPT-3.5 Turbo and GPT-4.
arXiv Detail & Related papers (2024-05-30T21:19:24Z)
Certifiably Robust RAG against Retrieval Corruption [58.677292678310934]
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks.
arXiv Detail & Related papers (2024-05-24T13:44:25Z)
Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations [9.209974698634175]
Retrieval-Augmented Generation (RAG) is a promising solution for addressing the limitations of Large Language Models (LLMs) In this work, we investigate two underexplored aspects when assessing the robustness of RAG. We introduce a novel attack method, the Genetic Attack on RAG (textitGARAG), which targets these aspects.
arXiv Detail & Related papers (2024-04-22T07:49:36Z)
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data. In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z)
On the Security Risks of Knowledge Graph Reasoning [71.64027889145261]
We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors. We present ROAR, a new class of attacks that instantiate a variety of such threats. We explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries.
arXiv Detail & Related papers (2023-05-03T18:47:42Z)
Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning [48.49658986576776]
Deep Reinforcement Learning (DRL) has numerous applications in the real world thanks to its outstanding ability in adapting to the surrounding environments. Despite its great advantages, DRL is susceptible to adversarial attacks, which precludes its use in real-life critical systems and applications. This paper presents emerging attacks in DRL-based systems and the potential countermeasures to defend against these attacks.
arXiv Detail & Related papers (2020-01-27T10:53:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.