Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
- URL: http://arxiv.org/abs/2512.11614v1
- Date: Fri, 12 Dec 2025 14:50:38 GMT
- Title: Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
- Authors: Björn Deiseroth, Max Henning Höth, Kristian Kersting, Letitia Parcalabescu,
- Abstract summary: We introduce a training framework that treats the entire RAG pipeline as an interactive proof system.<n>We show that M/A-trained LLMs show improved groundedness, completeness, soundness, and reject behavior.<n>Our results demonstrate that autonomous interactive-proof-style supervision provides a principled and practical path toward reliable RAG systems.
- Score: 40.19713302778418
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Retrieval-augmented generation (RAG) models rely on retrieved evidence to guide large language model (LLM) generators, yet current systems treat retrieval as a weak heuristic rather than verifiable evidence. As a result, LLMs answer without support, hallucinate under incomplete or misleading context, and rely on spurious evidence. We introduce a training framework that treats the entire RAG pipeline -- both the retriever and the generator -- as an interactive proof system via an adaptation of the Merlin-Arthur (M/A) protocol. Arthur (the generator LLM) trains on questions of unkown provenance: Merlin provides helpful evidence, while Morgana injects adversarial, misleading context. Both use a linear-time XAI method to identify and modify the evidence most influential to Arthur. Consequently, Arthur learns to (i) answer when the context support the answer, (ii) reject when evidence is insufficient, and (iii) rely on the specific context spans that truly ground the answer. We further introduce a rigorous evaluation framework to disentangle explanation fidelity from baseline predictive errors. This allows us to introduce and measure the Explained Information Fraction (EIF), which normalizes M/A certified mutual-information guarantees relative to model capacity and imperfect benchmarks. Across three RAG datasets and two model families of varying sizes, M/A-trained LLMs show improved groundedness, completeness, soundness, and reject behavior, as well as reduced hallucinations -- without needing manually annotated unanswerable questions. The retriever likewise improves recall and MRR through automatically generated M/A hard positives and negatives. Our results demonstrate that autonomous interactive-proof-style supervision provides a principled and practical path toward reliable RAG systems that treat retrieved documents not as suggestions, but as verifiable evidence.
Related papers
- GRACE: Reinforcement Learning for Grounded Response and Abstention under Contextual Evidence [9.80421132842862]
Retrieval-Augmented Generation (RAG) integrates external knowledge to enhance Large Language Models (LLMs)<n>RAG is susceptible to two critical flaws: providing correct answers without explicit grounded evidence and producing fabricated responses when the retrieved context is insufficient.<n>We propose GRACE, a reinforcement-learning framework that simultaneously mitigates both types of flaws.
arXiv Detail & Related papers (2026-01-08T02:47:33Z) - Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering [0.0]
GroupQA is a curated dataset of 1,635 controversial questions paired with 15,058 diversely-sourced evidence documents.<n>We characterize group-level evidence aggregation dynamics through controlled experiments.<n>We find that LLM explanations to group-based answers are unfaithful.
arXiv Detail & Related papers (2026-01-08T01:54:21Z) - ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z) - Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises [0.0]
We propose a framework that integrates abductive inference into retrieval-augmented LLMs.<n> Experimental results on abductive reasoning and multi-hop QA benchmarks show that our approach improves both answer accuracy and reasoning faithfulness.<n>This work highlights abductive inference as a promising direction for enhancing the robustness and explainability of RAG systems.
arXiv Detail & Related papers (2025-11-06T03:37:24Z) - VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z) - Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z) - Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.<n>In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.<n>We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z) - Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals [5.605770511387228]
RAGuard is the first benchmark to evaluate the robustness of RAG systems against misleading retrievals.<n>Unlike prior benchmarks that rely on synthetic noise, our fact-checking dataset captures naturally occurring misinformation.
arXiv Detail & Related papers (2025-02-22T05:50:15Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.