Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
        - URL: http://arxiv.org/abs/2411.01705v1
- Date: Sun, 03 Nov 2024 22:27:40 GMT
- Title: Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
- Authors: Yuefeng Peng, Junda Wang, Hong Yu, Amir Houmansadr, 
- Abstract summary: We investigate data extraction attacks targeting the knowledge databases of Retrieval-Augmented Generation (RAG) systems.
To reveal the vulnerability, we propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM.
- Score: 15.861833242429228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Despite significant advancements, large language models (LLMs) still struggle with providing accurate answers when lacking domain-specific or up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge bases, but it also introduces new attack surfaces. In this paper, we investigate data extraction attacks targeting the knowledge databases of RAG systems. We demonstrate that previous attacks on RAG largely depend on the instruction-following capabilities of LLMs, and that simple fine-tuning can reduce the success rate of such attacks to nearly zero. This makes these attacks impractical since fine-tuning is a common practice when deploying LLMs in specific domains. To further reveal the vulnerability, we propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM. When this compromised LLM is integrated into a RAG system, attackers can exploit specific triggers in prompts to manipulate the LLM to leak documents from the retrieval database. By carefully designing the poisoned data, we achieve both verbatim and paraphrased document extraction. We show that with only 3\% poisoned data, our method achieves an average success rate of 79.7\% in verbatim extraction on Llama2-7B, with a ROUGE-L score of 64.21, and a 68.6\% average success rate in paraphrased extraction, with an average ROUGE score of 52.6 across four datasets. These results underscore the privacy risks associated with the supply chain when deploying RAG systems. 
 
      
        Related papers
        - Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data   Could Be Secretly Stolen! [77.5835471257498]
 Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers.<n>We reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data.
 arXiv  Detail & Related papers  (2025-05-21T15:32:14Z)
- POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for   Large Language Models [4.620537391830117]
 Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
 arXiv  Detail & Related papers  (2025-05-10T09:36:28Z)
- Defending against Indirect Prompt Injection by Instruction Detection [81.98614607987793]
 We propose a novel approach that takes external data as input and leverages the behavioral state of LLMs during both forward and backward propagation to detect potential IPI attacks.<n>Our approach achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, while reducing the attack success rate to just 0.12% on the BIPIA benchmark.
 arXiv  Detail & Related papers  (2025-05-08T13:04:45Z)
- MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning   Attacks [109.53357276796655]
 Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)
RAG enhances MLLMs by grounding responses in query-relevant external knowledge.
This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.
We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
 arXiv  Detail & Related papers  (2025-02-25T04:23:59Z)
- PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
 LLMs may be fooled into outputting private information under carefully crafted adversarial prompts.
PrivAgent is a novel black-box red-teaming framework for privacy leakage.
 arXiv  Detail & Related papers  (2024-12-07T20:09:01Z)
- RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation   through LLM Activation Analysis [3.706288937295861]
 RevPRAG is a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection.
Our results on multiple benchmark datasets and RAG architectures show our approach could achieve 98% true positive rate, while maintaining false positive rates close to 1%.
 arXiv  Detail & Related papers  (2024-11-28T06:29:46Z)
- RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented   Generation Applications with Agent-based Attacks [18.576435409729655]
 We propose an agent-based automated privacy attack called RAG-Thief.
It can extract a scalable amount of private data from the private database used in RAG applications.
Our findings highlight the privacy vulnerabilities in current RAG applications and underscore the pressing need for stronger safeguards.
 arXiv  Detail & Related papers  (2024-11-21T13:18:03Z)
- Backdoored Retrievers for Prompt Injection Attacks on Retrieval   Augmented Generation of Large Language Models [0.0]
 Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval.
This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation.
We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
 arXiv  Detail & Related papers  (2024-10-18T14:02:34Z)
- MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
 Large language models (LLMs) have demonstrated remarkable capabilities.
Their powerful generative abilities enable flexible responses based on various queries or instructions.
This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
 arXiv  Detail & Related papers  (2024-08-20T10:44:29Z)
- Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations   in LLM-based Application Frameworks [12.061098193438022]
 Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge.
This paper investigates the security of RAG systems against end-to-end indirect prompt manipulations.
 arXiv  Detail & Related papers  (2024-08-09T12:26:05Z)
- Exploring Automatic Cryptographic API Misuse Detection in the Era of   LLMs [60.32717556756674]
 This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
 arXiv  Detail & Related papers  (2024-07-23T15:31:26Z)
- AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge   Bases [73.04652687616286]
 We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
 arXiv  Detail & Related papers  (2024-07-17T17:59:47Z)
- BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of   Large Language Models [18.107026036897132]
 Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data.
Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models.
RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web.
 arXiv  Detail & Related papers  (2024-06-03T02:25:33Z)
- Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered   Applications [10.06789804722156]
 We reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process.
Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33% success rate, and achieve a 66.67% success rate in the real-world application.
 arXiv  Detail & Related papers  (2024-04-26T07:11:18Z)
- Prompt Leakage effect and defense strategies for multi-turn LLM   interactions [95.33778028192593]
 Leakage of system prompts may compromise intellectual property and act as adversarial reconnaissance for an attacker.
We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting.
We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts.
 arXiv  Detail & Related papers  (2024-04-24T23:39:58Z)
- Follow My Instruction and Spill the Beans: Scalable Data Extraction from   Retrieval-Augmented Generation Systems [22.142588104314175]
 We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs)
We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore.
We design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries.
 arXiv  Detail & Related papers  (2024-02-27T19:08:05Z)
- The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
  Generation (RAG) [56.67603627046346]
 Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
 arXiv  Detail & Related papers  (2024-02-23T18:35:15Z)
- PAL: Proxy-Guided Black-Box Attack on Large Language Models [55.57987172146731]
 Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated capabilities to generate harmful content when manipulated.
We introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting.
Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art.
 arXiv  Detail & Related papers  (2024-02-15T02:54:49Z)
- Setting the Trap: Capturing and Defeating Backdoors in Pretrained
  Language Models through Honeypots [68.84056762301329]
 Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
 arXiv  Detail & Related papers  (2023-10-28T08:21:16Z)
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
 This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
 arXiv  Detail & Related papers  (2023-08-25T14:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.