Fugu-MT 論文翻訳(概要): MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

論文の概要: MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

arxiv url: http://arxiv.org/abs/2603.23085v1
Date: Tue, 24 Mar 2026 11:28:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.452632
Title: MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models
Title（参考訳）: MedCausalX:信頼できる医用ビジョンランゲージモデルのための自己反射型適応因果推論
Authors: Jianxin Lin, Chunzheng Zhu, Peter J. Kneuertz, Yunfei Bai, Yuan Xue,
Abstract要約: 既存の医療連鎖モデルには因果推論を表現・強制するための明確なメカニズムが欠如している。 MedCausalXは医療用VLMの因果推論チェーンを明示的にモデル化したエンドツーエンドフレームワークである。我々は,MedCausalXが常に最先端の手法より優れ,診断の整合性は+5.4ポイント向上し,幻覚は10ポイント以上減少し,最上位の空間接地IoUに達することを示す。
参考スコア（独自算出の注目度）: 10.466505116993451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious correlations and limiting their clinical reliability. We pinpoint three core challenges in medical CoT reasoning: how to adaptively trigger causal correction, construct high-quality causal-spurious contrastive samples, and maintain causal consistency across reasoning trajectories. To address these challenges, we propose MedCausalX, an end-to-end framework explicitly models causal reasoning chains in medical VLMs. We first introduce the CRMed dataset providing fine-grained anatomical annotations, structured causal reasoning chains, and counterfactual variants that guide the learning of causal relationships beyond superficial correlations. Building upon CRMed, MedCausalX employs a two-stage adaptive reflection architecture equipped with $\langle$causal$\rangle$ and $\langle$verify$\rangle$ tokens, enabling the model to autonomously determine when and how to perform causal analysis and verification. Finally, a trajectory-level causal correction objective optimized through error-attributed reinforcement learning refines the reasoning chain, allowing the model to distinguish genuine causal dependencies from shortcut associations. Extensive experiments on multiple benchmarks show that MedCausalX consistently outperforms state-of-the-art methods, improving diagnostic consistency by +5.4 points, reducing hallucination by over 10 points, and attaining top spatial grounding IoU, thereby setting a new standard for causally grounded medical reasoning.
Abstract（参考訳）: VLM(Vision-Language Models)は、視覚と言語的推論を統合することで、解釈可能な診断を可能にしている。しかし、既存の医療チェーン・オブ・シント(CoT)モデルは因果推論を表現し、強制するための明確なメカニズムを欠いており、それらが刺激的な相関に弱いままであり、臨床の信頼性を制限している。因果修正を適応的にトリガーする方法,高品質な因果分離型コントラストサンプルの構築,推論軌道間の因果整合の維持という,医療的CoT推論の課題を3つ挙げる。これらの課題に対処するために,医療用VLMの因果推論チェーンを明示的にモデル化したエンドツーエンドフレームワークであるMedCausalXを提案する。まず,微粒な解剖学的アノテーション,構造的因果推論連鎖,および表面的相関を超えた因果関係の学習を導く反ファクト的変異を提供するCRMedデータセットを紹介する。 MedCausalXはCRM上に構築されており、$\langle$causal$\rangle$と$\langle$verify$\rangle$トークンを備えた2段階のアダプティブリフレクションアーキテクチャを採用している。最後に、誤差分散強化学習によって最適化された軌道レベルの因果関係補正目標が推論連鎖を洗練させ、モデルが真の因果関係とショートカット関連を区別できるようにする。複数のベンチマークでの大規模な実験により、MedCausalXは最先端の手法を一貫して上回り、診断の整合性を+5.4ポイント改善し、幻覚を10ポイント以上減らし、最上位の空間接地IoUを達成し、因果的根拠の医学的推論のための新しい標準を確立した。

関連論文リスト

Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs [63.535652574541764]
MLLM(Multimodal Large Language Models)は医用画像解析において顕著な可能性を示した。消化器内視鏡におけるそれらの応用は、現在、2つの重要な限界によって妨げられている。本稿では,これらの課題に対処する新しい臨床認知アライメント(CogAlign)フレームワークを提案する。
論文参考訳（メタデータ） (2026-03-21T07:47:37Z)
Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning [16.144050164828794]
本稿では, 臨床薬品の理性差を学習し, 臨床薬品の改善を図るためのフレームワークDRLを提案する。 DRLは、有向非巡回グラフ(DAG)として推論グラフを抽出し、臨床重み付きグラフ編集距離(GED)に基づく不一致解析を行う。推論では、エージェントプロンプトを増強し、可能性のあるロジックギャップをパッチするために、トップ$k$命令を検索します。
論文参考訳（メタデータ） (2026-02-10T16:29:32Z)
PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization [6.821738567680833]
PathReasonerは,WSI推論の最初の大規模データセットである。 PathReasoner-R1は、教師付き微調整と推論指向の強化学習を相乗し、構造化されたチェーン・オブ・シント機能を注入する。実験により、PathReasoner-R1はPathReasonerと公開ベンチマークの両方で、様々な画像スケールで最先端のパフォーマンスを達成することが示された。
論文参考訳（メタデータ） (2026-01-29T12:21:16Z)
Making medical vision-language models think causally across modalities with retrieval-augmented cross-modal reasoning [16.243806723551454]
医用視覚言語モデル(VLM)は,診断報告や画像テキストアライメントにおいて高い性能を発揮する。その根底にある推論機構は、基本的に相関関係にあり、表面的な統計的関連に頼っている。因果推論の原理とマルチモーダル検索を統合するフレームワークであるMultimodal Causal Retrieval-Augmented Generationを提案する。
論文参考訳（メタデータ） (2026-01-26T11:03:00Z)
M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding [66.78251988482222]
CoT(Chain-of-Thought)推論は、ステップバイステップの中間推論を奨励することによって、大規模言語モデルの強化に有効であることが証明されている。医用画像理解のための現在のベンチマークでは、推論パスを無視しながら最終回答に重点を置いている。 M3CoTBenchは、透明で信頼性が高く、診断的に正確な医療用AIシステムの開発を促進することを目的としている。
論文参考訳（メタデータ） (2026-01-13T17:42:27Z)
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models [49.32415342913976]
マルチモーダルモデルにおけるCoT推論の視覚的グラウンドリングと論理的コヒーレンスを探索するための診断ベンチマークであるMM-CoTを紹介する。 MM-CoT上での先進的な視覚言語モデルの評価を行い,最も先進的なシステムでさえも苦戦し,生成頻度と真の推論忠実さの相違が明らかとなった。
論文参考訳（メタデータ） (2025-12-09T04:13:31Z)
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNetは、自己教師付き事前学習と制約付きグラフベースの推論を統合する新しいフレームワークである。 TongueAtlas-4Kは,22の診断ラベルを付した4,000枚の画像からなるベンチマークである。
論文参考訳（メタデータ） (2025-11-13T06:30:41Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
我々はMed-VQA(Med-VQA)のための視覚的LVLM応答を保証するフレームワークであるMedAlignを開発した。まず、優先学習を視覚的コンテキストに合わせるために、マルチモーダルな直接選好最適化(mDPO)の目的を提案する。次に、画像とテキストの類似性を生かし、クエリを専門的でコンテキスト拡張されたLVLMにルーティングする検索型混合処理(RA-MoE)アーキテクチャを設計する。
論文参考訳（メタデータ） (2025-10-24T02:11:05Z)
Benchmarking and Mitigate Sycophancy in Medical Vision-Language Models [21.353225217216252]
視覚言語モデルは、しばしば、証拠に基づく推論よりも、社会的手がかりや認識された権威を記述したユーザーとの整合性に優先順位を付ける、幻想的行動を示す。本研究は, 新規な臨床評価基準を用いて, 医用視覚質問応答における臨床症状について検討した。
論文参考訳（メタデータ） (2025-09-26T07:02:22Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。