Fugu-MT 論文翻訳(概要): Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models

論文の概要: Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models

arxiv url: http://arxiv.org/abs/2510.18303v1
Date: Tue, 21 Oct 2025 05:18:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.93058
Title: Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
Title（参考訳）: 医療用マルチモーダル大言語モデルのための確率的推論と検索フレームワーク
Authors: Lehan Wang, Yi Qin, Honglong Yang, Xiaomeng Li,
Abstract要約: 我々は,Med-RwR を用いた最初のマルチモーダル医療推論フレームワークを提案する。 Med-RwRは、推論中に観察された症状やドメイン固有の医療概念を問い合わせることで、外部知識を積極的に回収する。様々な公開医療ベンチマークの評価は、Med-RwRのベースラインモデルに対する大幅な改善を示している。
参考スコア（独自算出の注目度）: 15.530083855947987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incentivizing the reasoning ability of Multimodal Large Language Models (MLLMs) is essential for medical applications to transparently analyze medical scans and provide reliable diagnosis. However, existing medical MLLMs rely solely on internal knowledge during reasoning, leading to hallucinated reasoning and factual inaccuracies when encountering cases beyond their training scope. Although recent Agentic Retrieval-Augmented Generation (RAG) methods elicit the medical model's proactive retrieval ability during reasoning, they are confined to unimodal LLMs, neglecting the crucial visual information during reasoning and retrieval. Consequently, we propose the first Multimodal Medical Reasoning-with-Retrieval framework, Med-RwR, which actively retrieves external knowledge by querying observed symptoms or domain-specific medical concepts during reasoning. Specifically, we design a two-stage reinforcement learning strategy with tailored rewards that stimulate the model to leverage both visual diagnostic findings and textual clinical information for effective retrieval. Building on this foundation, we further propose a Confidence-Driven Image Re-retrieval (CDIR) method for test-time scaling when low prediction confidence is detected. Evaluation on various public medical benchmarks demonstrates Med-RwR's significant improvements over baseline models, proving the effectiveness of enhancing reasoning capabilities with external knowledge integration. Furthermore, Med-RwR demonstrates remarkable generalizability to unfamiliar domains, evidenced by 8.8% performance gain on our proposed EchoCardiography Benchmark (ECBench), despite the scarcity of echocardiography data in the training corpus. Our data, model, and codes will be made publicly available at https://github.com/xmed-lab/Med-RwR.
Abstract（参考訳）: マルチモーダル大言語モデル(MLLM)の推論能力の活性化は、医療応用において、医療スキャンを透過的に分析し、信頼性の高い診断を提供するために不可欠である。しかし、既存の医療MLLMは推論において内的知識のみに依存しており、訓練範囲を超えて事件に遭遇した場合の幻覚的推論と事実的不正確性につながる。近年のRAG(Agenic Retrieval-Augmented Generation)法は, 推論において, 医用モデルの積極的な検索能力を引き出すものであるが, 推論と検索において重要な視覚情報を無視して, 単一のLDMに制限されている。そこで,本研究では,観察された症状や領域固有の医療概念を照会し,外部知識を積極的に検索する,Med-RwRを初めて提案する。具体的には、2段階の強化学習戦略を設計し、視覚的診断所見とテキスト臨床情報の両方を有効検索に活用するようモデルに刺激を与える。この基盤を基盤として、予測信頼度が低い場合に、テスト時間スケーリングのための信頼性駆動画像再検索法(CDIR)を提案する。様々な公開医療ベンチマークの評価は、Med-RwRがベースラインモデルよりも大幅に改善していることを示し、外部知識の統合による推論能力の向上の有効性を証明している。さらに、Med-RwRは、トレーニングコーパスにおけるエコー心電図データの不足にもかかわらず、提案したEchoCardiography Benchmark(ECBench)において8.8%の性能向上によって証明された、不慣れな領域に対する顕著な一般化性を示した。私たちのデータ、モデル、コードはhttps://github.com/xmed-lab/Med-RwR.comで公開されます。

論文の概要: Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models

関連論文リスト