Fugu-MT 論文翻訳(概要): End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

論文の概要: End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

arxiv url: http://arxiv.org/abs/2508.15746v1
Date: Thu, 21 Aug 2025 17:42:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-22 16:26:46.427185
Title: End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Title（参考訳）: トレーサブル診断のためのエンド・ツー・エンドエージェントRAGシステムトレーニング
Authors: Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie,
Abstract要約: Deep-DxSearchは、強化学習(RL)でエンドツーエンドに訓練されたエージェントRAGシステムである。 Deep-DxSearchでは,患者記録と信頼性のある医療知識情報を含む大規模医療検索コーパスを構築した。実験により、エンドツーエンドのRLトレーニングフレームワークは、プロンプトエンジニアリングやトレーニングフリーなRAGアプローチよりも一貫して優れています。
参考スコア（独自算出の注目度）: 52.12425911708585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate diagnosis with medical large language models is hindered by knowledge gaps and hallucinations. Retrieval and tool-augmented methods help, but their impact is limited by weak use of external knowledge and poor feedback-reasoning traceability. To address these challenges, We introduce Deep-DxSearch, an agentic RAG system trained end-to-end with reinforcement learning (RL) that enables steer tracebale retrieval-augmented reasoning for medical diagnosis. In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources to support retrieval-aware reasoning across diagnostic scenarios. More crutially, we frame the LLM as the core agent and the retrieval corpus as its environment, using tailored rewards on format, retrieval, reasoning structure, and diagnostic accuracy, thereby evolving the agentic RAG policy from large-scale data through RL. Experiments demonstrate that our end-to-end agentic RL training framework consistently outperforms prompt-engineering and training-free RAG approaches across multiple data centers. After training, Deep-DxSearch achieves substantial gains in diagnostic accuracy, surpassing strong diagnostic baselines such as GPT-4o, DeepSeek-R1, and other medical-specific frameworks for both common and rare disease diagnosis under in-distribution and out-of-distribution settings. Moreover, ablation studies on reward design and retrieval corpus components confirm their critical roles, underscoring the uniqueness and effectiveness of our approach compared with traditional implementations. Finally, case studies and interpretability analyses highlight improvements in Deep-DxSearch's diagnostic policy, providing deeper insight into its performance gains and supporting clinicians in delivering more reliable and precise preliminary diagnoses. See https://github.com/MAGIC-AI4Med/Deep-DxSearch.
Abstract（参考訳）: 医学的大言語モデルによる正確な診断は、知識ギャップと幻覚によって妨げられる。検索とツール拡張の手法は役に立つが、その影響は外部知識の弱い使用と、フィードバックの少ないトレーサビリティによって制限される。これらの課題に対処するために、我々は、強化学習(RL)を用いたエージェントRAGシステムのエンドツーエンドトレーニングであるDeep-DxSearchを紹介した。 Deep-DxSearchでは、まず患者記録と信頼できる医療知識ソースからなる大規模医療検索コーパスを構築し、診断シナリオ間の検索認識推論を支援する。さらに, LLMをコアエージェントとして, 検索コーパスを環境として, 形式, 検索, 推論構造, 診断精度に合わせた報酬を用いて, エージェントRAGポリシーをRLを通じて大規模データから進化させる。実験により、エンドツーエンドのエージェントRLトレーニングフレームワークは、複数のデータセンタにわたるプロンプトエンジニアリングとトレーニングフリーのRAGアプローチよりも一貫して優れています。トレーニング後、Deep-DxSearchは診断精度を大幅に向上させ、GPT-4o、DeepSeek-R1などの診断基準や、分布内および分布外設定下での一般的および稀な疾患診断のための他の医学固有のフレームワークを超越した。さらに、報酬設計と検索コーパスコンポーネントに関するアブレーション研究は、従来の実装と比較して、我々のアプローチの独特さと有効性を明確にし、それらの重要な役割を裏付けるものである。最後に、ケーススタディと解釈可能性分析は、Deep-DxSearchの診断ポリシーの改善を強調し、パフォーマンス向上に関する深い洞察を提供し、臨床医がより信頼性が高く正確な事前診断を提供するのをサポートする。 https://github.com/MAGIC-AI4Med/Deep-DxSearchを参照。

論文の概要: End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

関連論文リスト