On the Vulnerability of Applying Retrieval-Augmented Generation within
Knowledge-Intensive Application Domains
- URL: http://arxiv.org/abs/2409.17275v1
- Date: Thu, 12 Sep 2024 02:43:40 GMT
- Title: On the Vulnerability of Applying Retrieval-Augmented Generation within
Knowledge-Intensive Application Domains
- Authors: Xun Xian, Ganghua Wang, Xuan Bi, Jayanth Srinivasa, Ashish Kundu,
Charles Fleming, Mingyi Hong, Jie Ding
- Abstract summary: Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains.
We show that RAG is vulnerable to universal poisoning attacks in medical Q&A.
We develop a new detection-based defense to ensure the safe use of RAG.
- Score: 34.122040172188406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG) has been empirically shown to enhance
the performance of large language models (LLMs) in knowledge-intensive domains
such as healthcare, finance, and legal contexts. Given a query, RAG retrieves
relevant documents from a corpus and integrates them into the LLMs' generation
process. In this study, we investigate the adversarial robustness of RAG,
focusing specifically on examining the retrieval system. First, across 225
different setup combinations of corpus, retriever, query, and targeted
information, we show that retrieval systems are vulnerable to universal
poisoning attacks in medical Q\&A. In such attacks, adversaries generate
poisoned documents containing a broad spectrum of targeted information, such as
personally identifiable information. When these poisoned documents are inserted
into a corpus, they can be accurately retrieved by any users, as long as
attacker-specified queries are used. To understand this vulnerability, we
discovered that the deviation from the query's embedding to that of the
poisoned document tends to follow a pattern in which the high similarity
between the poisoned document and the query is retained, thereby enabling
precise retrieval. Based on these findings, we develop a new detection-based
defense to ensure the safe use of RAG. Through extensive experiments spanning
various Q\&A domains, we observed that our proposed method consistently
achieves excellent detection rates in nearly all cases.
Related papers
- Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation [18.098228823748617]
We present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore.
We demonstrate successful inference with just 30 queries while remaining stealthy.
We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations.
arXiv Detail & Related papers (2025-02-01T04:01:18Z) - Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.
We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.
Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks [72.4498910775871]
Vision-language model (VLM)-based retrievers leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods.
In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers.
arXiv Detail & Related papers (2025-01-28T12:40:37Z) - TrustRAG: Enhancing Robustness and Trustworthiness in RAG [31.231916859341865]
TrustRAG is a framework that systematically filters compromised and irrelevant contents before they are retrieved for generation.
TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance compared to existing approaches.
arXiv Detail & Related papers (2025-01-01T15:57:34Z) - Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation [19.543102037001134]
Language models (LMs) are known to suffer from hallucinations and misinformation.
Retrieval augmented generation (RAG) that retrieves verifiable information from an external knowledge corpus provides a tangible solution to these problems.
RAG generation quality is highly dependent on the relevance between a user's query and the retrieved documents.
arXiv Detail & Related papers (2024-10-10T19:14:55Z) - Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages.
We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z) - Poisoning Retrieval Corpora by Injecting Adversarial Passages [79.14287273842878]
We propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages.
When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems.
We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised.
arXiv Detail & Related papers (2023-10-29T21:13:31Z) - Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and
Baseline via Detection [12.244543468021938]
This paper introduces two types of detection tasks for adversarial documents.
A benchmark dataset is established to facilitate the investigation of adversarial ranking defense.
A comprehensive investigation of the performance of several detection baselines is conducted.
arXiv Detail & Related papers (2023-07-31T16:31:24Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - Multi-Expert Adversarial Attack Detection in Person Re-identification
Using Context Inconsistency [47.719533482898306]
We propose a Multi-Expert Adversarial Attack Detection (MEAAD) approach to detect malicious attacks on person re-identification (ReID) systems.
As the first adversarial attack detection approach for ReID,MEAADeffectively detects various adversarial at-tacks and achieves high ROC-AUC (over 97.5%).
arXiv Detail & Related papers (2021-08-23T01:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.