Fugu-MT 論文翻訳(概要): XAI-CLIP: ROI-Guided Perturbation Framework for Explainable Medical Image Segmentation in Multimodal Vision-Language Models

論文の概要: XAI-CLIP: ROI-Guided Perturbation Framework for Explainable Medical Image Segmentation in Multimodal Vision-Language Models

arxiv url: http://arxiv.org/abs/2602.07017v1
Date: Sun, 01 Feb 2026 00:27:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-10 20:26:24.359161
Title: XAI-CLIP: ROI-Guided Perturbation Framework for Explainable Medical Image Segmentation in Multimodal Vision-Language Models
Title（参考訳）: XAI-CLIP:マルチモーダルビジョン言語モデルにおける説明可能な医用画像分割のためのROI誘導摂動フレームワーク
Authors: Thuraya Alzubaidi, Sana Ammar, Maryam Alsharqi, Islem Rekik, Muzammil Behzad,
Abstract要約: XAI-CLIPは、医療画像セグメンテーションのためのROI誘導摂動フレームワークである。言語インフォームド・リージョンのローカライゼーションと医療画像のセグメンテーションを統合し、ターゲットとなる地域対応の摂動を適用します。 XAI-CLIPは最大60%のランタイム削減、44.6%のダイススコアの改善、96.7%のIntersection-over-Unionを実現している。
参考スコア（独自算出の注目度）: 4.5236257764997205
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical image segmentation is a critical component of clinical workflows, enabling accurate diagnosis, treatment planning, and disease monitoring. However, despite the superior performance of transformer-based models over convolutional architectures, their limited interpretability remains a major obstacle to clinical trust and deployment. Existing explainable artificial intelligence (XAI) techniques, including gradient-based saliency methods and perturbation-based approaches, are often computationally expensive, require numerous forward passes, and frequently produce noisy or anatomically irrelevant explanations. To address these limitations, we propose XAI-CLIP, an ROI-guided perturbation framework that leverages multimodal vision-language model embeddings to localize clinically meaningful anatomical regions and guide the explanation process. By integrating language-informed region localization with medical image segmentation and applying targeted, region-aware perturbations, the proposed method generates clearer, boundary-aware saliency maps while substantially reducing computational overhead. Experiments conducted on the FLARE22 and CHAOS datasets demonstrate that XAI-CLIP achieves up to a 60\% reduction in runtime, a 44.6\% improvement in dice score, and a 96.7\% increase in Intersection-over-Union for occlusion-based explanations compared to conventional perturbation methods. Qualitative results further confirm cleaner and more anatomically consistent attribution maps with fewer artifacts, highlighting that the incorporation of multimodal vision-language representations into perturbation-based XAI frameworks significantly enhances both interpretability and efficiency, thereby enabling transparent and clinically deployable medical image segmentation systems.
Abstract（参考訳）: 医用画像のセグメンテーションは臨床ワークフローの重要な要素であり、正確な診断、治療計画、疾患のモニタリングを可能にする。しかし、コンボリューションアーキテクチャよりもトランスフォーマーベースのモデルの方が優れているにもかかわらず、その限定的な解釈性は、臨床信頼とデプロイメントの大きな障害である。既存の説明可能な人工知能(XAI)技術では、勾配に基づくサリエンシ法や摂動に基づくアプローチは、しばしば計算コストが高く、多くの前方通過を必要とし、しばしばノイズや解剖学的に無関係な説明を生成する。これらの制約に対処するために,多モーダル視覚言語モデル埋め込みを利用したROI誘導摂動フレームワークXAI-CLIPを提案する。言語インフォームド・リージョンのローカライゼーションを医療画像のセグメンテーションと統合し、ターゲットとなる領域認識の摂動を適用することにより、計算オーバーヘッドを大幅に削減しつつ、より明確で境界対応のサリエンシ・マップを生成する。 FLARE22とCHAOSデータセットで実施された実験では、XAI-CLIPは実行時の最大60\%の削減、ダイススコアの44.6\%の改善、従来の摂動法と比較して閉塞に基づく説明に対するインターセクション・オーバー・ユニオンの96.7\%の増加を達成している。定性的結果は、よりクリーンで、より解剖学的に一貫性のある帰属マップを少ないアーティファクトで確認し、マルチモーダル視覚言語表現を摂動に基づくXAIフレームワークに組み込むことで、解釈可能性と効率の両方を大幅に向上し、透過的で臨床的に展開可能な医療画像セグメンテーションシステムを可能にすることを強調した。

関連論文リスト

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging [12.545486211087791]
医療診断のための新しい不確実性を考慮したマルチモーダルセグメンテーションフレームワークを提案する。本稿では,高効率なクロスモーダル融合を実現するために,軽量なステートスペースミキサ(SSMix)を備えたModality Decoding Attention Block (MoDAB)を提案する。本研究は,視覚言語医学的セグメンテーションタスクにおいて,不確実性モデリングと構造化モダリティアライメントを取り入れることの重要性を強調した。
論文参考訳（メタデータ） (2026-02-16T06:27:51Z)
Structure-constrained Language-informed Diffusion Model for Unpaired Low-dose Computed Tomography Angiography Reconstruction [72.80209358480424]
ヨウ素化コントラスト培地(ICM)の過剰摂取は、腎臓の損傷と致命的なアレルギー反応を引き起こす。深層学習法は、低線量ICMから正常線量ICMのCT画像を生成することができ、必要な線量を減らすことができる。本研究では,構造シナジーと空間知性を統合した構造制約型言語情報拡散モデル(SLDM)を提案する。
論文参考訳（メタデータ） (2026-01-28T06:54:06Z)
Anatomical Region-Guided Contrastive Decoding: A Plug-and-Play Strategy for Mitigating Hallucinations in Medical VLMs [20.507007953026346]
Anatomical Region-Guided Contrastive Decoding (ARCD) は、目標とする地域固有のガイダンスを提供することで幻覚を緩和するプラグアンドプレイ戦略である。本手法は, 地域理解の向上, 幻覚の低減, 総合的診断精度の向上に有効である。
論文参考訳（メタデータ） (2025-12-19T03:11:20Z)
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNetは、自己教師付き事前学習と制約付きグラフベースの推論を統合する新しいフレームワークである。 TongueAtlas-4Kは,22の診断ラベルを付した4,000枚の画像からなるベンチマークである。
論文参考訳（メタデータ） (2025-11-13T06:30:41Z)
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
医用画像の臨床的に正確な記述を作成することを目的とした医用レポート生成。本稿では, 自己監督型解剖学的一貫性学習(SS-ACL)を提案し, 生成された報告を対応する解剖学的領域と整合させる。 SS-ACLは、ヒト解剖学の不変のトップダウン包摂構造にインスパイアされた階層的な解剖学的グラフを構築する。
論文参考訳（メタデータ） (2025-09-30T08:59:06Z)
DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision [9.254163621425727]
DiSSECTはSSLパイプラインにマルチスケールベクトル量子化を統合するフレームワークで、離散的な表現ボトルネックを課す。分類タスクとセグメンテーションタスクの両方で強力なパフォーマンスを実現し、微調整は最小か不要である。複数の公開医用画像データセットにまたがってDiSSECTを検証し、その堅牢性と一般化性を示す。
論文参考訳（メタデータ） (2025-09-23T07:58:21Z)
GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning [60.03671205298294]
医学的視覚的質問応答は、医学的イメージに基づいた自然言語的質問にモデルで答えることによって、臨床的な意思決定を支援することを目的としている。現在の方法はまだ、答えの信頼性の制限と解釈性の低下に悩まされている。この研究はまず、回答を生成するプロセスが中間的推論ステップのシーケンスに先行する領域対応マルチモーダル・チェーン・オブ・ソートデータセットを提案する。
論文参考訳（メタデータ） (2025-06-22T08:09:58Z)
MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network [51.68708264694361]
融合因子は、複雑な解剖学的変異や画像のモダリティ制限などの医療画像に影響を与える可能性がある。医用画像セグメンテーションのためのバックドア・インターベンション最適化ネットワークを提案する。本手法は, 混乱要因の影響を著しく低減し, セグメンテーション精度を向上させる。
論文参考訳（メタデータ） (2025-05-28T01:40:10Z)
Federated Learning for Coronary Artery Plaque Detection in Atherosclerosis Using IVUS Imaging: A Multi-Hospital Collaboration [8.358846277772779]
経皮的冠動脈インターベンション(PCI)における血管内超音波(IVUS)画像の従来的解釈は時間集約的かつ矛盾する。多段階セグメンテーションアーキテクチャを持つ並列2次元U-Netモデルを開発した。 0.706のDice similarity Coefficient (DSC) は、プラークを効果的に識別し、リアルタイムで円形の境界を検出する。
論文参考訳（メタデータ） (2024-12-19T13:06:28Z)
Augmentation is AUtO-Net: Augmentation-Driven Contrastive Multiview Learning for Medical Image Segmentation [3.1002416427168304]
この論文は網膜血管セグメンテーションの課題に焦点を当てている。深層学習に基づく医用画像セグメンテーションアプローチの広範な文献レビューを提供する。効率的でシンプルな多視点学習フレームワークを提案する。
論文参考訳（メタデータ） (2023-11-02T06:31:08Z)
Explaining Clinical Decision Support Systems in Medical Imaging using Cycle-Consistent Activation Maximization [112.2628296775395]
ディープニューラルネットワークを用いた臨床意思決定支援は、着実に関心が高まりつつあるトピックとなっている。臨床医は、その根底にある意思決定プロセスが不透明で理解しにくいため、この技術の採用をためらうことが多い。そこで我々は,より小さなデータセットであっても,分類器決定の高品質な可視化を生成するCycleGANアクティベーションに基づく,新たな意思決定手法を提案する。
論文参考訳（メタデータ） (2020-10-09T14:39:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。