Fugu-MT 論文翻訳(概要): Uncertainty Quantification for Retrieval-Augmented Reasoning

論文の概要: Uncertainty Quantification for Retrieval-Augmented Reasoning

arxiv url: http://arxiv.org/abs/2510.11483v1
Date: Mon, 13 Oct 2025 14:55:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:30.412909
Title: Uncertainty Quantification for Retrieval-Augmented Reasoning
Title（参考訳）: Retrieval-Augmented Reasoningにおける不確かさの定量化
Authors: Heydar Soudani, Hamed Zamani, Faegheh Hasibi,
Abstract要約: Retrieval-augmented reasoning (RAR)は、検索と生成に複数の推論ステップを用いる検索強化世代(RAG)の最近の進化である。不確実性定量化(Uncertainty Quantification、UQ)は、システムの出力の信頼性を推定する方法を提供する。本稿では,RARの新しいUQ手法であるRetrieval-Augmented Reasoning(R2C)を紹介する。
参考スコア（独自算出の注目度）: 40.43455995861054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-augmented reasoning (RAR) is a recent evolution of retrieval-augmented generation (RAG) that employs multiple reasoning steps for retrieval and generation. While effective for some complex queries, RAR remains vulnerable to errors and misleading outputs. Uncertainty quantification (UQ) offers methods to estimate the confidence of systems' outputs. These methods, however, often handle simple queries with no retrieval or single-step retrieval, without properly handling RAR setup. Accurate estimation of UQ for RAR requires accounting for all sources of uncertainty, including those arising from retrieval and generation. In this paper, we account for all these sources and introduce Retrieval-Augmented Reasoning Consistency (R2C)--a novel UQ method for RAR. The core idea of R2C is to perturb the multi-step reasoning process by applying various actions to reasoning steps. These perturbations alter the retriever's input, which shifts its output and consequently modifies the generator's input at the next step. Through this iterative feedback loop, the retriever and generator continuously reshape one another's inputs, enabling us to capture uncertainty arising from both components. Experiments on five popular RAR systems across diverse QA datasets show that R2C improves AUROC by over 5% on average compared to the state-of-the-art UQ baselines. Extrinsic evaluations using R2C as an external signal further confirm its effectiveness for two downstream tasks: in Abstention, it achieves ~5% gains in both F1Abstain and AccAbstain; in Model Selection, it improves the exact match by ~7% over single models and ~3% over selection methods.
Abstract（参考訳）: Retrieval-augmented reasoning (RAR)は、検索と生成に複数の推論ステップを用いる検索強化世代(RAG)の最近の進化である。複雑なクエリには有効だが、RARはエラーやミスリードアウトプットに対して脆弱である。不確実性定量化(Uncertainty Quantification、UQ)は、システムの出力の信頼性を推定する方法を提供する。しかし、これらの手法は、RARセットアップを適切に処理することなく、検索やシングルステップ検索なしで単純なクエリを処理することが多い。 RARの正確なUQ推定には、検索および生成から生じるものを含むすべての不確実性ソースを考慮する必要がある。本稿では,これらすべての情報源について考察し,RARの新しいUQ手法であるRetrieval-Augmented Reasoning Consistency (R2C)を紹介する。 R2Cの中核となる考え方は、様々なアクションを推論ステップに適用することで、多段階の推論プロセスを摂動させることである。これらの摂動によってレトリバーの入力が変更され、出力がシフトし、次のステップでジェネレータの入力が変更される。この繰り返しフィードバックループを通じて、レトリバーとジェネレータは互いの入力を連続的に再生成し、両方のコンポーネントから生じる不確実性を捕捉する。さまざまなQAデータセットにまたがる5つの人気のあるRARシステムの実験によると、R2Cは最先端のUQベースラインと比較して平均で5%以上改善している。外部信号としてR2Cを用いた外部評価では、AbstentionではF1AbstainとAccAbstainの両方で約5%のゲインを達成し、Model Selectionでは、シングルモデルで約7%、選択法で約3%の精度向上を実現している。

論文の概要: Uncertainty Quantification for Retrieval-Augmented Reasoning

関連論文リスト