Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
- URL: http://arxiv.org/abs/2411.01705v2
- Date: Sun, 30 Mar 2025 01:49:11 GMT
- Title: Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
- Authors: Yuefeng Peng, Junda Wang, Hong Yu, Amir Houmansadr,
- Abstract summary: We investigate data extraction attacks targeting RAG's knowledge databases.<n>We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs.<n>We propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM.
- Score: 15.861833242429228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant advancements, large language models (LLMs) still struggle with providing accurate answers when lacking domain-specific or up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge bases, but it also introduces new attack surfaces. In this paper, we investigate data extraction attacks targeting RAG's knowledge databases. We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs. As a result, they fail on models that are less responsive to such malicious prompts -- for example, our experiments show that state-of-the-art attacks achieve near-zero success on Gemma-2B-IT. Moreover, even for models that can follow these instructions, we found fine-tuning may significantly reduce attack performance. To further reveal the vulnerability, we propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM. When this compromised LLM is integrated into a RAG system, attackers can exploit specific triggers in prompts to manipulate the LLM to leak documents from the retrieval database. By carefully designing the poisoned data, we achieve both verbatim and paraphrased document extraction. For example, on Gemma-2B-IT, we show that with only 5\% poisoned data, our method achieves an average success rate of 94.1\% for verbatim extraction (ROUGE-L score: 82.1) and 63.6\% for paraphrased extraction (average ROUGE score: 66.4) across four datasets. These results underscore the privacy risks associated with the supply chain when deploying RAG systems.
Related papers
- MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)
RAG enhances MLLMs by grounding responses in query-relevant external knowledge.
This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.
We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z) - PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMs may be fooled into outputting private information under carefully crafted adversarial prompts.
PrivAgent is a novel black-box red-teaming framework for privacy leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z) - RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis [3.706288937295861]
RevPRAG is a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection.
Our results on multiple benchmark datasets and RAG architectures show our approach could achieve 98% true positive rate, while maintaining false positive rates close to 1%.
arXiv Detail & Related papers (2024-11-28T06:29:46Z) - RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks [18.576435409729655]
We propose an agent-based automated privacy attack called RAG-Thief.
It can extract a scalable amount of private data from the private database used in RAG applications.
Our findings highlight the privacy vulnerabilities in current RAG applications and underscore the pressing need for stronger safeguards.
arXiv Detail & Related papers (2024-11-21T13:18:03Z) - Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models [0.0]
Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval.
This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation.
We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
arXiv Detail & Related papers (2024-10-18T14:02:34Z) - MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
Large language models (LLMs) have demonstrated remarkable capabilities.
Their powerful generative abilities enable flexible responses based on various queries or instructions.
This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
arXiv Detail & Related papers (2024-08-20T10:44:29Z) - Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks [12.061098193438022]
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge.
This paper investigates the security of RAG systems against end-to-end indirect prompt manipulations.
arXiv Detail & Related papers (2024-08-09T12:26:05Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z) - BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models [18.107026036897132]
Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data.
Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models.
RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web.
arXiv Detail & Related papers (2024-06-03T02:25:33Z) - Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications [10.06789804722156]
We reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process.
Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33% success rate, and achieve a 66.67% success rate in the real-world application.
arXiv Detail & Related papers (2024-04-26T07:11:18Z) - Prompt Leakage effect and defense strategies for multi-turn LLM interactions [95.33778028192593]
Leakage of system prompts may compromise intellectual property and act as adversarial reconnaissance for an attacker.
We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting.
We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts.
arXiv Detail & Related papers (2024-04-24T23:39:58Z) - Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems [22.142588104314175]
We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs)
We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore.
We design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries.
arXiv Detail & Related papers (2024-02-27T19:08:05Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - PAL: Proxy-Guided Black-Box Attack on Large Language Models [55.57987172146731]
Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated capabilities to generate harmful content when manipulated.
We introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting.
Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art.
arXiv Detail & Related papers (2024-02-15T02:54:49Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.