Fugu-MT 論文翻訳(概要): Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation

論文の概要: Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation

arxiv url: http://arxiv.org/abs/2603.10143v1
Date: Tue, 10 Mar 2026 18:25:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.653165
Title: Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation
Title（参考訳）: Reason and Verify: Fithful Retrieval-Augmented Generationのフレームワーク
Authors: Eeham Khan, Luis Rodriguez, Marc Queudot,
Abstract要約: 本稿では,明示的なレアソニングと忠実度検証を統合したドメイン固有RAGフレームワークを提案する。我々のアーキテクチャは、ニューラルネットワークの書き換え、BGEベースのクロスエンコーダのランク付け、合理生成モジュールによる標準検索を強化する。我々は、このフレームワークをBioASQとPubMedQAベンチマークで評価し、動的インコンテキスト学習の影響を具体的に分析する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Retrieval-Augmented Generation (RAG) significantly improves the factuality of Large Language Models (LLMs), yet standard pipelines often lack mechanisms to verify inter- mediate reasoning, leaving them vulnerable to hallucinations in high-stakes domains. To address this, we propose a domain-specific RAG framework that integrates explicit rea- soning and faithfulness verification. Our architecture augments standard retrieval with neural query rewriting, BGE-based cross-encoder reranking, and a rationale generation module that grounds sub-claims in specific evidence spans. We further introduce an eight-category verification taxonomy that enables fine-grained assessment of rationale faithfulness, distinguishing between explicit and implicit support patterns to facilitate structured error diagnosis. We evaluate this framework on the BioASQ and PubMedQA benchmarks, specifically analyzing the impact of dynamic in-context learning and rerank- ing under constrained token budgets. Experiments demonstrate that explicit rationale generation improves accuracy over vanilla RAG baselines, while dynamic demonstration selection combined with robust reranking yields further gains in few-shot settings. Using Llama-3-8B-Instruct, our approach achieves 89.1% on BioASQ-Y/N and 73.0% on Pub- MedQA, competitive with systems using significantly larger models. Additionally, we perform a pilot study combining human expert assessment with LLM-based verification to explore how explicit rationale generation improves system transparency and enables more detailed diagnosis of retrieval failures in biomedical question answering.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG) は、Large Language Models (LLM) の事実性を著しく改善するが、標準的なパイプラインはメディア間推論を検証するメカニズムが欠如しており、高い領域の幻覚に弱いままである。そこで本稿では,明示的なレアソナリングと忠実度検証を統合したドメイン固有RAGフレームワークを提案する。我々のアーキテクチャは、ニューラルネットワークの書き直し、BGEベースのクロスエンコーダのランク付け、および特定のエビデンスを根拠とする有理生成モジュールによる標準検索を強化しています。さらに,構造的誤りの診断を容易にするために,明示的および暗黙的なサポートパターンを区別し,合理的忠実さのきめ細かい評価を可能にする8つのカテゴリの検証分類を導入する。我々は,この枠組みをBioASQとPubMedQAベンチマークで評価し,制約付きトークン予算下での動的インコンテキスト学習と再帰学習の影響を具体的に分析した。実験により,バニラRAGベースラインよりも明確な合理性生成により精度が向上することが示された。 Llama-3-8B-Instruct を用いて,BioASQ-Y/N で89.1%,Pub-MedQA で73.0% を達成した。さらに,人間の専門家による評価とLCMに基づく検証を組み合わせることで,明示的合理的生成がシステムの透明性を向上し,バイオメディカル質問応答における検索障害のより詳細な診断を可能にした。

論文の概要: Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation

関連論文リスト