Related papers: Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain

Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain

URL: http://arxiv.org/abs/2509.03787v1
Date: Thu, 04 Sep 2025 00:45:58 GMT
Title: Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain
Authors: Shakiba Amirshahi, Amin Bigdeli, Charles L. A. Clarke, Amira Ghenai,
Abstract summary: Retrieval augmented generation (RAG) systems provide a method for factually grounding the responses of a Large Language Model (LLM) by providing retrieved evidence, or context, as support.<n>This design introduces a critical vulnerability: LLMs may absorb and reproduce misinformation present in retrieved evidence.<n>This problem is magnified if retrieved evidence contains adversarial material explicitly intended to promulgate misinformation.
Score: 8.094811345546118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval augmented generation (RAG) systems provide a method for factually grounding the responses of a Large Language Model (LLM) by providing retrieved evidence, or context, as support. Guided by this context, RAG systems can reduce hallucinations and expand the ability of LLMs to accurately answer questions outside the scope of their training data. Unfortunately, this design introduces a critical vulnerability: LLMs may absorb and reproduce misinformation present in retrieved evidence. This problem is magnified if retrieved evidence contains adversarial material explicitly intended to promulgate misinformation. This paper presents a systematic evaluation of RAG robustness in the health domain and examines alignment between model outputs and ground-truth answers. We focus on the health domain due to the potential for harm caused by incorrect responses, as well as the availability of evidence-based ground truth for many common health-related questions. We conduct controlled experiments using common health questions, varying both the type and composition of the retrieved documents (helpful, harmful, and adversarial) as well as the framing of the question by the user (consistent, neutral, and inconsistent). Our findings reveal that adversarial documents substantially degrade alignment, but robustness can be preserved when helpful evidence is also present in the retrieval pool. These findings offer actionable insights for designing safer RAG systems in high-stakes domains by highlighting the need for retrieval safeguards. To enable reproducibility and facilitate future research, all experimental results are publicly available in our github repository. https://github.com/shakibaam/RAG_ROBUSTNESS_EVAL

Related papers

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare [0.05249805590164902]
This work investigates the performance of five large language models (LLMs) in generating responses to medicine-related queries.<n>Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers.
arXiv Detail & Related papers (2025-11-10T03:27:54Z)
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering [21.855579328680246]
We propose MedTrust-Guided Iterative RAG, a framework designed to enhance factual consistency and hallucinations in medical QA.<n>First, it enforces citation-aware reasoning by requiring all generated content to be explicitly grounded in retrieved medical documents.<n>Second, it employs an iterative retrieval-verification process, where a verification agent assesses evidence adequacy.
arXiv Detail & Related papers (2025-10-16T07:59:11Z)
VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z)
Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine [22.983780823136925]
Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare.<n>We propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model.<n>Our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support.
arXiv Detail & Related papers (2025-03-18T09:17:31Z)
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations.<n>RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into the retrieval corpus can mislead models into producing factually incorrect outputs.<n>We present a rigorously controlled empirical study of how RAG systems behave under such attacks and how their robustness can be improved.
arXiv Detail & Related papers (2024-12-21T17:31:52Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)
On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains [32.71308102835446]
Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains.<n>We show that RAG is vulnerable to universal poisoning attacks in medical Q&A.<n>We develop a new detection-based defense to ensure the safe use of RAG.
arXiv Detail & Related papers (2024-09-12T02:43:40Z)
Certifiably Robust RAG against Retrieval Corruption [58.677292678310934]
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks.
arXiv Detail & Related papers (2024-05-24T13:44:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.