Fugu-MT 論文翻訳(概要): Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols

論文の概要: Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols

arxiv url: http://arxiv.org/abs/2512.11614v1
Date: Fri, 12 Dec 2025 14:50:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-15 15:48:11.809922
Title: Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
Title（参考訳）: 幻覚の境界:Merlin-ArthurプロトコルによるRAGシステムの情報理論保証
Authors: Björn Deiseroth, Max Henning Höth, Kristian Kersting, Letitia Parcalabescu,
Abstract要約: 本稿では,RAGパイプライン全体をインタラクティブな証明システムとして扱うためのトレーニングフレームワークを提案する。その結果,M/A訓練によるLLMは,基礎性,完全性,音性,拒否行動が改善された。本研究は,自律型対話型防犯スタイルの監視が,信頼性の高いRAGシステムへの原則的かつ実践的な経路を提供することを示す。
参考スコア（独自算出の注目度）: 40.19713302778418
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Retrieval-augmented generation (RAG) models rely on retrieved evidence to guide large language model (LLM) generators, yet current systems treat retrieval as a weak heuristic rather than verifiable evidence. As a result, LLMs answer without support, hallucinate under incomplete or misleading context, and rely on spurious evidence. We introduce a training framework that treats the entire RAG pipeline -- both the retriever and the generator -- as an interactive proof system via an adaptation of the Merlin-Arthur (M/A) protocol. Arthur (the generator LLM) trains on questions of unkown provenance: Merlin provides helpful evidence, while Morgana injects adversarial, misleading context. Both use a linear-time XAI method to identify and modify the evidence most influential to Arthur. Consequently, Arthur learns to (i) answer when the context support the answer, (ii) reject when evidence is insufficient, and (iii) rely on the specific context spans that truly ground the answer. We further introduce a rigorous evaluation framework to disentangle explanation fidelity from baseline predictive errors. This allows us to introduce and measure the Explained Information Fraction (EIF), which normalizes M/A certified mutual-information guarantees relative to model capacity and imperfect benchmarks. Across three RAG datasets and two model families of varying sizes, M/A-trained LLMs show improved groundedness, completeness, soundness, and reject behavior, as well as reduced hallucinations -- without needing manually annotated unanswerable questions. The retriever likewise improves recall and MRR through automatically generated M/A hard positives and negatives. Our results demonstrate that autonomous interactive-proof-style supervision provides a principled and practical path toward reliable RAG systems that treat retrieved documents not as suggestions, but as verifiable evidence.
Abstract（参考訳）: Retrieval-augmented Generation (RAG) モデルは、大規模言語モデル(LLM)ジェネレータを導くために、検索された証拠に依存するが、現在のシステムは、検証された証拠ではなく、弱いヒューリスティックとして検索を扱う。その結果、LLMはサポートなしで答え、不完全あるいは誤解を招く状況下で幻覚を与え、突発的な証拠に依存している。本稿では,Merlin-Arthur (M/A)プロトコルの適応による対話型証明システムとして,レトリバーとジェネレータの両方のRAGパイプライン全体を扱うトレーニングフレームワークを導入する。アーサー(ジェネレータLSM)は無名の証明に関する質問を訓練する: マーリンは有益な証拠を提供し、モーガンは敵対的で誤解を招く文脈を注入する。どちらも、アーサーに最も影響力のある証拠を特定し、修正するために線形時間XAI法を使用している。その結果、アーサーは学ぶ。 (i) 文脈が回答を支持するときの回答。二証拠が不十分なときは拒絶し、 (三)真に答えの土台となる特定の文脈に依存すること。さらに、ベースライン予測誤差から説明の忠実度を遠ざけるための厳密な評価フレームワークを導入する。これにより、モデルキャパシティと不完全なベンチマークに対して、M/A認定相互情報保証を正規化する説明情報断片(EIF)を導入し、測定することができる。 3つのRAGデータセットと2つのモデルファミリーで、M/AトレーニングされたLLMは、手動で注釈付けできない質問をすることなく、基礎性、完全性、健全性、拒否行動が改善された。検索器も同様に、自動生成されたM/Aハードポジティと負によってリコールとMRRを改善する。この結果から,自律的対話型防犯スタイルの監視は,検索した文書を証拠としてではなく,信頼性の高いRAGシステムへの原則的かつ実践的な経路を提供することが示された。

論文の概要: Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols

関連論文リスト