Related papers: PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

URL: http://arxiv.org/abs/2402.07867v3
Date: Tue, 13 Aug 2024 01:55:06 GMT
Title: PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
Authors: Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia,
Abstract summary: Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG.
Score: 45.409248316497674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. The key idea of RAG is to ground the answer generation of an LLM on external knowledge retrieved from a knowledge database. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG, where an attacker could inject a few malicious texts into the knowledge database of a RAG system to induce an LLM to generate an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge corruption attacks as an optimization problem, whose solution is a set of malicious texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on a RAG system, we propose two solutions to solve the optimization problem, respectively. Our results show PoisonedRAG could achieve a 90% attack success rate when injecting five malicious texts for each target question into a knowledge database with millions of texts. We also evaluate several defenses and our results show they are insufficient to defend against PoisonedRAG, highlighting the need for new defenses.

Related papers

Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs [8.09404178079053]
Retrieval-Augmented Generation (RAG) integrates Large Language Models (LLMs) with external knowledge bases, improving output quality while introducing new security risks. Existing studies on RAG vulnerabilities typically focus on exploiting the retrieval mechanism to inject erroneous knowledge or malicious texts, inducing incorrect outputs. In this paper, we uncover a novel vulnerability: the safety guardrails of LLMs, while designed for protection, can also be exploited as an attack vector by adversaries. Building on this vulnerability, we propose MutedRAG, a novel denial-of-service attack that reversely leverages the guardrails to undermine the availability of
arXiv Detail & Related papers (2025-04-30T14:18:11Z)
Practical Poisoning Attacks against Retrieval-Augmented Generation [9.320227105592917]
Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. We propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text.
arXiv Detail & Related papers (2025-04-04T21:49:42Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG) RAG enhances MLLMs by grounding responses in query-relevant external knowledge. This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks. We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework. It reformulates harmful queries into benign reasoning tasks. We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [19.41533176888415]
Retrieval-Augmented Generation (RAG) addresses hallucination and real-time constraints by dynamically retrieving relevant information from a knowledge database. In this paper, we unveil a more realistic and threatening scenario: opinion manipulation for controversial topics against RAG. We propose a novel RAG black-box attack method, termed FlipedRAG, which is transfer-based.
arXiv Detail & Related papers (2025-01-06T12:24:57Z)
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models [18.301965456681764]
We reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG) HijackRAG enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. We propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge.
arXiv Detail & Related papers (2024-10-30T09:15:51Z)
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks [12.061098193438022]
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This paper investigates the security of RAG systems against end-to-end indirect prompt manipulations.
arXiv Detail & Related papers (2024-08-09T12:26:05Z)
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks [10.36389246679405]
Malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. We conduct an investigation of various poisoning techniques targeting the large language models' fine-tuning phase via the Efficient Fine-Tuning (PEFT) method. Our study presents the first systematic approach to understanding poisoning attacks targeting NLG tasks during fine-tuning via PEFT.
arXiv Detail & Related papers (2024-07-17T03:02:15Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation [30.63258739968483]
Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) We propose new attack vectors that allow an adversary to inject a single malicious document into a RAG system's knowledge base, and mount a backdoor poisoning attack. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama, and show that they transfer to GPT-3.5 Turbo and GPT-4.
arXiv Detail & Related papers (2024-05-30T21:19:24Z)
Certifiably Robust RAG against Retrieval Corruption [58.677292678310934]
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks.
arXiv Detail & Related papers (2024-05-24T13:44:25Z)
Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations [9.209974698634175]
Retrieval-Augmented Generation (RAG) is a promising solution for addressing the limitations of Large Language Models (LLMs) In this work, we investigate two underexplored aspects when assessing the robustness of RAG. We introduce a novel attack method, the Genetic Attack on RAG (textitGARAG), which targets these aspects.
arXiv Detail & Related papers (2024-04-22T07:49:36Z)
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data. In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z)
On the Security Risks of Knowledge Graph Reasoning [71.64027889145261]
We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors. We present ROAR, a new class of attacks that instantiate a variety of such threats. We explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries.
arXiv Detail & Related papers (2023-05-03T18:47:42Z)
Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning [48.49658986576776]
Deep Reinforcement Learning (DRL) has numerous applications in the real world thanks to its outstanding ability in adapting to the surrounding environments. Despite its great advantages, DRL is susceptible to adversarial attacks, which precludes its use in real-life critical systems and applications. This paper presents emerging attacks in DRL-based systems and the potential countermeasures to defend against these attacks.
arXiv Detail & Related papers (2020-01-27T10:53:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.