Fugu-MT 論文翻訳(概要): MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains

論文の概要: MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains

arxiv url: http://arxiv.org/abs/2508.18260v1
Date: Mon, 25 Aug 2025 17:53:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.903655
Title: MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
Title（参考訳）: MIRAGE: 並列グラフ検索拡張型推論チェーンによるテスト時間推論のスケーリング
Authors: Kaiwen Wei, Rui Shan, Dongsheng Zou, Jianzhong Yang, Bi Zhao, Junnan Zhu, Jiang Zhong,
Abstract要約: MIRAGE(Multi-chain Inference with Retrieval-Augmented Graph Exploration)は、テスト時のスケーラブルな推論フレームワークである。構造化された医療知識グラフに対する動的マルチチェーン推論を実行する。自動評価と人的評価の両方において、GPT-4o、Tree-of-Thought、その他の検索強化ベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 19.018680886214035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large reasoning models (LRMs) have shown significant progress in test-time scaling through chain-of-thought prompting. Current approaches like search-o1 integrate retrieval augmented generation (RAG) into multi-step reasoning processes but rely on a single, linear reasoning chain while incorporating unstructured textual information in a flat, context-agnostic manner. As a result, these approaches can lead to error accumulation throughout the reasoning chain, which significantly limits its effectiveness in medical question-answering (QA) tasks where both accuracy and traceability are critical requirements. To address these challenges, we propose MIRAGE (Multi-chain Inference with Retrieval-Augmented Graph Exploration), a novel test-time scalable reasoning framework that performs dynamic multi-chain inference over structured medical knowledge graphs. Specifically, MIRAGE 1) decomposes complex queries into entity-grounded sub-questions, 2) executes parallel inference chains, 3) retrieves evidence adaptively via neighbor expansion and multi-hop traversal, and 4) integrates answers using cross-chain verification to resolve contradictions. Experiments on three medical QA benchmarks (GenMedGPT-5k, CMCQA, and ExplainCPE) show that MIRAGE consistently outperforms GPT-4o, Tree-of-Thought variants, and other retrieval-augmented baselines in both automatic and human evaluations. Additionally, MIRAGE improves interpretability by generating explicit reasoning chains that trace each factual claim to concrete chains within the knowledge graph, making it well-suited for complex medical reasoning scenarios. The code will be available for further research.
Abstract（参考訳）: 大規模推論モデル(LRM)は、チェーン・オブ・ソート・プロンプトによるテスト時間スケーリングにおいて大きな進歩を示している。検索o1のような現在のアプローチは、検索拡張生成(RAG)を多段階推論プロセスに統合しているが、フラットでコンテキストに依存しない方法で構造化されていないテキスト情報を組み込んで、単一の線形推論チェーンに依存している。結果として、これらのアプローチは推論チェーン全体のエラーの蓄積につながり、精度とトレーサビリティの両方が重要な要件である医療質問応答(QA)タスクにおいて、その有効性を著しく制限する。これらの課題に対処するため、構造化された医療知識グラフ上で動的マルチチェーン推論を実行する新しいテスト時スケーラブル推論フレームワークであるMIRAGE(Multi-chain Inference with Retrieval-Augmented Graph Exploration)を提案する。特にMIRAGE 1) 複雑なクエリをエンティティ基底のサブクエリに分解する。 2) 並列推論チェーンを実行する。 3)隣接する拡張やマルチホップ・トラバースを通じて証拠を適応的に回収し, 4) 矛盾を解決するためにクロスチェーン検証を使用して回答を統合する。 3つの医学QAベンチマーク(GenMedGPT-5k, CMCQA, ExplainCPE)での実験では、MIRAGEはGPT-4o, Tree-of-Thought variants, その他検索強化ベースラインを自動評価と人為評価の両方で一貫して上回っている。さらに、MIRAGEは、知識グラフ内の個々の事実的主張を具体的な連鎖にトレースする明示的な推論連鎖を生成することにより、解釈可能性を改善し、複雑な医学的推論シナリオに適している。コードは、さらなる研究のために利用できる。

論文の概要: MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains

関連論文リスト