PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented
Generation of Large Language Models
- URL: http://arxiv.org/abs/2402.07867v1
- Date: Mon, 12 Feb 2024 18:28:36 GMT
- Title: PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented
Generation of Large Language Models
- Authors: Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia
- Abstract summary: We propose PoisonedRAG, a set of knowledge poisoning attacks to RAG.
We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts.
Our results show our attacks could achieve 90% attack success rates when injecting 5 poisoned texts for each target question into a database with millions of texts.
- Score: 49.606341607616926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have achieved remarkable success due to their
exceptional generative capabilities. Despite their success, they also have
inherent limitations such as a lack of up-to-date knowledge and hallucination.
Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to
mitigate those limitations. In particular, given a question, RAG retrieves
relevant knowledge from a knowledge database to augment the input of the LLM.
For instance, the retrieved knowledge could be a set of top-k texts that are
most semantically similar to the given question when the knowledge database
contains millions of texts collected from Wikipedia. As a result, the LLM could
utilize the retrieved knowledge as the context to generate an answer for the
given question. Existing studies mainly focus on improving the accuracy or
efficiency of RAG, leaving its security largely unexplored. We aim to bridge
the gap in this work. Particularly, we propose PoisonedRAG , a set of knowledge
poisoning attacks to RAG, where an attacker could inject a few poisoned texts
into the knowledge database such that the LLM generates an attacker-chosen
target answer for an attacker-chosen target question. We formulate knowledge
poisoning attacks as an optimization problem, whose solution is a set of
poisoned texts. Depending on the background knowledge (e.g., black-box and
white-box settings) of an attacker on the RAG, we propose two solutions to
solve the optimization problem, respectively. Our results on multiple benchmark
datasets and LLMs show our attacks could achieve 90% attack success rates when
injecting 5 poisoned texts for each target question into a database with
millions of texts. We also evaluate recent defenses and our results show they
are insufficient to defend against our attacks, highlighting the need for new
defenses.
Related papers
- Turning Generative Models Degenerate: The Power of Data Poisoning Attacks [10.36389246679405]
Malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs.
We conduct an investigation of various poisoning techniques targeting the large language models' fine-tuning phase via the Efficient Fine-Tuning (PEFT) method.
Our study presents the first systematic approach to understanding poisoning attacks targeting NLG tasks during fine-tuning via PEFT.
arXiv Detail & Related papers (2024-07-17T03:02:15Z) - BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models [18.107026036897132]
Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data.
Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models.
RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web.
arXiv Detail & Related papers (2024-06-03T02:25:33Z) - Phantom: General Trigger Attacks on Retrieval Augmented Language Generation [30.63258739968483]
We propose new attack surfaces for an adversary to compromise a victim's RAG system.
The first step involves crafting a poisoned document designed to be retrieved by the RAG system.
In the second step, a specially crafted adversarial string within the poisoned document triggers various adversarial attacks.
arXiv Detail & Related papers (2024-05-30T21:19:24Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - Learning to Poison Large Language Models During Instruction Tuning [10.450787229190203]
This work identifies additional security risks in Large Language Models (LLMs) by designing a new data poisoning attack tailored to exploit the instruction tuning process.
We propose a novel gradient-guided backdoor trigger learning approach to identify adversarial triggers efficiently.
Our strategy demonstrates a high success rate in compromising model outputs.
arXiv Detail & Related papers (2024-02-21T01:30:03Z) - Leveraging the Context through Multi-Round Interactions for Jailbreaking
Attacks [60.7432588386185]
Large Language Models (LLMs) are susceptible to Jailbreaking attacks.
Jailbreaking attacks aim to extract harmful information by subtly modifying the attack query.
We focus on a new attack form called Contextual Interaction Attack.
arXiv Detail & Related papers (2024-02-14T13:45:19Z) - Forcing Generative Models to Degenerate Ones: The Power of Data
Poisoning Attacks [10.732558183444985]
Malicious actors can covertly exploit large language models (LLMs) vulnerabilities through poisoning attacks aimed at generating undesirable outputs.
This paper explores various poisoning techniques to assess their effectiveness across a range of generative tasks.
We show that it is possible to successfully poison an LLM during the fine-tuning stage using as little as 1% of the total tuning data samples.
arXiv Detail & Related papers (2023-12-07T23:26:06Z) - Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game [86.66627242073724]
This paper presents a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection.
To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs.
We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking.
arXiv Detail & Related papers (2023-11-02T06:13:36Z) - Attack Prompt Generation for Red Teaming and Defending Large Language
Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content.
We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z) - On the Security Risks of Knowledge Graph Reasoning [71.64027889145261]
We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors.
We present ROAR, a new class of attacks that instantiate a variety of such threats.
We explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries.
arXiv Detail & Related papers (2023-05-03T18:47:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.