Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models
- URL: http://arxiv.org/abs/2410.14479v1
- Date: Fri, 18 Oct 2024 14:02:34 GMT
- Title: Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models
- Authors: Cody Clop, Yannick Teglia,
- Abstract summary: Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval.
This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation.
We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
- Score: 0.0
- License:
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent text but remain limited by the static nature of their training data. Retrieval Augmented Generation (RAG) addresses this issue by combining LLMs with up-to-date information retrieval, but also expand the attack surface of the system. This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation, such as inserting harmful links, promoting unauthorized services, and initiating denial-of-service behaviors. We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component. Our experiments reveal that corpus poisoning can achieve significant attack success rates through the injection of a small number of compromised documents into the retriever corpus. In contrast, backdoor attacks demonstrate even higher success rates but necessitate a more complex setup, as the victim must fine-tune the retriever using the attacker poisoned dataset.
Related papers
- Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - On the Vulnerability of Applying Retrieval-Augmented Generation within
Knowledge-Intensive Application Domains [34.122040172188406]
Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains.
We show that RAG is vulnerable to universal poisoning attacks in medical Q&A.
We develop a new detection-based defense to ensure the safe use of RAG.
arXiv Detail & Related papers (2024-09-12T02:43:40Z) - Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks [10.26810397377592]
We introduce the Efficient and Stealthy Textual backdoor attack method, EST-Bad, leveraging Large Language Models (LLMs)
Our EST-Bad encompasses three core strategies: optimizing the inherent flaw of models as the trigger, stealthily injecting triggers with LLMs, and meticulously selecting the most impactful samples for backdoor injection.
arXiv Detail & Related papers (2024-08-21T12:50:23Z) - Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages.
We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z) - BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models [18.107026036897132]
Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data.
Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models.
RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web.
arXiv Detail & Related papers (2024-06-03T02:25:33Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - SSL-OTA: Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection [8.178238811631093]
We propose the first backdoor attack designed for object detection tasks in SSL scenarios, called Object Transform Attack (SSL-OTA)
SSL-OTA employs a trigger capable of altering predictions of the target object to the desired category.
We conduct extensive experiments on benchmark datasets, demonstrating the effectiveness of our proposed attack and its resistance to potential defenses.
arXiv Detail & Related papers (2023-12-30T04:21:12Z) - Poisoning Retrieval Corpora by Injecting Adversarial Passages [79.14287273842878]
We propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages.
When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems.
We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised.
arXiv Detail & Related papers (2023-10-29T21:13:31Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.