ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
- URL: http://arxiv.org/abs/2509.23519v1
- Date: Sat, 27 Sep 2025 22:36:42 GMT
- Title: ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
- Authors: Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, Aleksandra Korolova,
- Abstract summary: We present ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents.<n>Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.
- Score: 69.60882125603133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models by grounding their outputs in external documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AI Overview) present an interesting setting for studying and protecting against such threats, as defense algorithms can benefit from built-in reliability signals -- like document ranking -- and represent a non-LLM challenge for the adversary due to decades of work to thwart SEO. Motivated by, but not limited to, this scenario, this work introduces ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents. Our first contribution adopts a graph-theoretic perspective to identify a "consistent majority" among retrieved documents to filter out malicious ones. We introduce a novel algorithm based on finding a Maximum Independent Set (MIS) on a document graph where edges encode contradiction. Our MIS variant explicitly prioritizes higher-reliability documents and provides provable robustness guarantees against bounded adversarial corruption under natural assumptions. Recognizing the computational cost of exact MIS for large retrieval sets, our second contribution is a scalable weighted sample and aggregate framework. It explicitly utilizes reliability information, preserving some robustness guarantees while efficiently handling many documents. We present empirical results showing ReliabilityRAG provides superior robustness against adversarial attacks compared to prior methods, maintains high benign accuracy, and excels in long-form generation tasks where prior robustness-focused methods struggled. Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.
Related papers
- BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search [72.87861928940929]
Boundary-Aware Policy Optimization (BAPO) is a novel RL framework designed to cultivate reliable boundary awareness without compromising accuracy.<n>BAPO introduces two key components: (i) a group-based boundary-aware reward that encourages an IDK response only when the reasoning reaches its limit, and (ii) an adaptive reward modulator that strategically suspends this reward during early exploration, preventing the model from exploiting IDK as a shortcut.
arXiv Detail & Related papers (2026-01-16T07:06:58Z) - SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG [35.42029959485188]
Retrieval-augmented generation (RAG) systems enhance large language models with external knowledge.<n>Existing defenses often apply aggressive filtering, leading to unnecessary loss of valuable information.<n>We propose a two-stage semantic filtering and conflict-free framework for trustworthy RAG.
arXiv Detail & Related papers (2025-10-10T03:44:29Z) - Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft [16.826893547339548]
We introduce RPD, a novel dataset specifically designed for RAG plagiarism detection.<n>We develop a dual-layered watermarking system that embeds protection at both semantic and lexical levels.<n>This work establishes a foundational framework for intellectual property protection in retrieval-augmented AI systems.
arXiv Detail & Related papers (2025-10-09T03:09:18Z) - Towards Reliable Retrieval in RAG Systems for Large Legal Datasets [6.376251215279889]
Retrieval-Augmented Generation (RAG) is a promising approach to mitigate hallucinations in Large Language Models (LLMs)<n>This is particularly challenging in the legal domain, where large databases of structurally similar documents often cause retrieval systems to fail.<n>We investigate a simple and computationally efficient technique which enhances each text chunk with a document-level synthetic summary.<n>Our work provides evidence that this practical, scalable, and easily integrable technique enhances the reliability of RAG systems when applied to large-scale legal document datasets.
arXiv Detail & Related papers (2025-10-08T13:22:20Z) - Provably Secure Retrieval-Augmented Generation [7.412110686946628]
This paper proposes the first provably secure framework for Retrieval-Augmented Generation (RAG) systems.<n>Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings.
arXiv Detail & Related papers (2025-08-01T21:37:16Z) - MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG [65.0423152595537]
We propose MES-RAG, which enhances entity-specific query handling and provides accurate, secure, and consistent responses.<n>MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access.<n> Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering.
arXiv Detail & Related papers (2025-03-17T08:09:42Z) - TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation [31.231916859341865]
TrustRAG is a framework that systematically filters malicious and irrelevant content before it is retrieved for generation.<n>TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance.
arXiv Detail & Related papers (2025-01-01T15:57:34Z) - FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation [1.3824176915623292]
This paper introduces textitFederated Retrieval-Augmented Generation (FRAG), a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems.
FRAG enables mutually-distrusted parties to collaboratively perform Approximate $k$-Nearest Neighbor (ANN) searches on encrypted query vectors and encrypted data stored in distributed vector databases.
arXiv Detail & Related papers (2024-10-17T06:57:29Z) - Certifiably Robust RAG against Retrieval Corruption [58.677292678310934]
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks.
In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks.
arXiv Detail & Related papers (2024-05-24T13:44:25Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.