Related papers: The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems

The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems

URL: http://arxiv.org/abs/2502.20995v1
Date: Fri, 28 Feb 2025 12:32:53 GMT
Title: The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems
Authors: Chanwoo Choi, Jinsoo Kim, Sukmin Cho, Soyeong Jeong, Buru Chang,
Abstract summary: We introduce a realistic black-box attack scenario based on the RAG paradox, where RAG systems inadvertently expose vulnerabilities while attempting to enhance trustworthiness.<n>Because RAG systems reference external documents during response generation, our attack targets these sources without requiring internal access.<n>Our approach first identifies the external sources disclosed by RAG systems and then automatically generates poisoned documents with misinformation designed to match these sources.
Score: 8.347617177093056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the growing adoption of retrieval-augmented generation (RAG) systems, recent studies have introduced attack methods aimed at degrading their performance. However, these methods rely on unrealistic white-box assumptions, such as attackers having access to RAG systems' internal processes. To address this issue, we introduce a realistic black-box attack scenario based on the RAG paradox, where RAG systems inadvertently expose vulnerabilities while attempting to enhance trustworthiness. Because RAG systems reference external documents during response generation, our attack targets these sources without requiring internal access. Our approach first identifies the external sources disclosed by RAG systems and then automatically generates poisoned documents with misinformation designed to match these sources. Finally, these poisoned documents are newly published on the disclosed sources, disrupting the RAG system's response generation process. Both offline and online experiments confirm that this attack significantly reduces RAG performance without requiring internal access. Furthermore, from an insider perspective within the RAG system, we propose a re-ranking method that acts as a fundamental safeguard, offering minimal protection against unforeseen attacks.

Related papers

Safety Devolution in AI Agents [56.482973617087254]
This study investigates how expanding retrieval access affects model reliability, bias propagation, and harmful content generation.<n>Retrieval-augmented agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval.<n>These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-augmented and increasingly autonomous AI systems.
arXiv Detail & Related papers (2025-05-20T11:21:40Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
Traceback of Poisoning Attacks to Retrieval-Augmented Generation [10.19539347377776]
Research has revealed RAG's susceptibility to poisoning attacks, where the attacker injects poisoned texts into the knowledge database. Existing defenses, which predominantly focus on inference-time mitigation, have proven insufficient against sophisticated attacks. We introduce RAGForensics, the first traceback system for RAG, designed to identify poisoned texts within the knowledge database that are responsible for the attacks.
arXiv Detail & Related papers (2025-04-30T14:10:02Z)
ControlNET: A Firewall for RAG-based LLM System [9.362574883495927]
Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs) RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities.
arXiv Detail & Related papers (2025-04-13T14:18:35Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)<n>RAG enhances MLLMs by grounding responses in query-relevant external knowledge.<n>This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey [92.36487127683053]
Retrieval-Augmented Generation (RAG) is an advanced technique designed to address the challenges of Artificial Intelligence-Generated Content (AIGC)<n>RAG provides reliable and up-to-date external knowledge, reduces hallucinations, and ensures relevant context across a wide range of tasks.<n>Despite RAG's success and potential, recent studies have shown that the RAG paradigm also introduces new risks, including privacy concerns, adversarial attacks, and accountability issues.
arXiv Detail & Related papers (2025-02-08T06:50:47Z)
Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks [72.4498910775871]
Vision-language model (VLM)-based retrievers leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods.<n>In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers.
arXiv Detail & Related papers (2025-01-28T12:40:37Z)
TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation [31.231916859341865]
TrustRAG is a framework that systematically filters malicious and irrelevant content before it is retrieved for generation.<n>TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance.
arXiv Detail & Related papers (2025-01-01T15:57:34Z)
Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases [11.101624331624933]
This paper presents a black-box attack to force a RAG system to leak its private knowledge base.<n>A relevance-based mechanism and an attacker-side open-source LLM favor the generation of effective queries to leak most of the (hidden) knowledge base.
arXiv Detail & Related papers (2024-12-24T09:03:57Z)
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations. RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into retrieval databases can mislead the model into generating factually incorrect outputs. This paper investigates both the retrieval and the generation components of RAG systems to understand how to enhance their robustness against such attacks.
arXiv Detail & Related papers (2024-12-21T17:31:52Z)
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models [18.301965456681764]
We reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG) HijackRAG enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. We propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge.
arXiv Detail & Related papers (2024-10-30T09:15:51Z)
On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains [32.71308102835446]
Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains.<n>We show that RAG is vulnerable to universal poisoning attacks in medical Q&A.<n>We develop a new detection-based defense to ensure the safe use of RAG.
arXiv Detail & Related papers (2024-09-12T02:43:40Z)
ConfusedPilot: Confused Deputy Risks in RAG-based LLMs [2.423202571519879]
We introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.
arXiv Detail & Related papers (2024-08-09T05:20:05Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation [0.9217021281095907]
We introduce an efficient and easy-to-use method for conducting a Membership Inference Attack (MIA) against RAG systems.<n>We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models.<n>Our findings highlight the importance of implementing security countermeasures in deployed RAG systems.
arXiv Detail & Related papers (2024-05-30T19:46:36Z)
Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective [53.24281798458074]
Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning)
arXiv Detail & Related papers (2024-05-21T13:34:23Z)
RAGGED: Towards Informed Design of Scalable and Stable RAG Systems [51.171355532527365]
Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge.<n>RAGGED is a framework for systematically evaluating RAG systems.
arXiv Detail & Related papers (2024-03-14T02:26:31Z)
Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback. We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.