Fugu-MT 論文翻訳(概要): Enhancing Medical Visual Grounding via Knowledge-guided Spatial Prompts

論文の概要: Enhancing Medical Visual Grounding via Knowledge-guided Spatial Prompts

arxiv url: http://arxiv.org/abs/2604.01915v1
Date: Thu, 02 Apr 2026 11:31:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.709329
Title: Enhancing Medical Visual Grounding via Knowledge-guided Spatial Prompts
Title（参考訳）: 知識誘導型空間プロンプトによる医用ビジュアルグラウンドの強化
Authors: Yifan Gao, Tao Zhou, Yi Zhou, Ke Zou, Yizhe Zhang, Huazhu Fu,
Abstract要約: 医用ビジュアルグラウンドリング(MVG)は、フリーテキストラジオグラフィーレポートから関連するフレーズを識別し、医療画像中の対応する領域をローカライズすることを目的としている。我々は,フレーズ関連医療知識をコンパクトな埋め込みにエンコードする知識強化促進戦略であるKnowMVGを提案する。この設計は、余分なテキスト推論オーバーヘッドを導入することなく、高レベルな意味理解ときめ細かい視覚知覚を橋渡しする。
参考スコア（独自算出の注目度）: 52.256130375429414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical Visual Grounding (MVG) aims to identify diagnostically relevant phrases from free-text radiology reports and localize their corresponding regions in medical images, providing interpretable visual evidence to support clinical decision-making. Although recent Vision-Language Models (VLMs) exhibit promising multimodal reasoning ability, their grounding remains insufficient spatial precision, largely due to a lack of explicit localization priors when relying solely on latent embeddings. In this work, we analyze this limitation from an attention perspective and propose KnowMVG, a Knowledge-prior and global-local attention enhancement framework for MVG in VLMs that explicitly strengthens spatial awareness during decoding. Specifically, we present a knowledge-enhanced prompting strategy that encodes phrase related medical knowledge into compact embeddings, together with a global-local attention that jointly leverages coarse global information and refined local cues to guide precise region localization. localization. This design bridges high-level semantic understanding and fine-grained visual perception without introducing extra textual reasoning overhead. Extensive experiments on four MVG benchmarks demonstrate that our KnowMVG consistently outperforms existing approaches, achieving gains of 3.0% in AP50 and 2.6% in mIoU over prior state-of-the-art methods. Qualitative and ablation studies further validate the effectiveness of each component.
Abstract（参考訳）: 医用ビジュアルグラウンドリング (MVG) は、診断に関連のあるフレーズをフリーテキストのラジオグラフィーレポートから識別し、医療画像中の対応する領域をローカライズすることを目的としており、臨床的意思決定を支援するための解釈可能な視覚的証拠を提供する。近年のVision-Language Models (VLM) は、有望なマルチモーダル推論能力を示しているが、その基盤は空間的精度が不十分なままであり、主に潜伏埋め込みのみに依存する場合の明示的な位置決めの欠如によるものである。本研究では,この制限を注意点から分析し,復号時の空間認識を明示的に強化する,VLMにおけるMVGのための知識優先的かつグローバルな注目促進フレームワークであるKnowMVGを提案する。具体的には、フレーズ関連医療知識をコンパクトな埋め込みにエンコードする知識強化促進戦略と、粗大なグローバル情報と洗練された局所的手がかりを併用して、精密な地域ローカライゼーションを導出するグローバルな注意を提示する。ローカライゼーションこの設計は、余分なテキスト推論オーバーヘッドを導入することなく、高レベルな意味理解ときめ細かい視覚知覚を橋渡しする。 4つのMVGベンチマークの大規模な実験により、我々のノウMVGは既存のアプローチを一貫して上回り、AP50では3.0%、mIoUでは2.6%のアップを達成した。定性的およびアブレーション研究は、各成分の有効性をさらに検証する。

論文の概要: Enhancing Medical Visual Grounding via Knowledge-guided Spatial Prompts

関連論文リスト