Fugu-MT 論文翻訳(概要): Does medical specialization of VLMs enhance discriminative power?: A comprehensive investigation through feature distribution analysis

論文の概要: Does medical specialization of VLMs enhance discriminative power?: A comprehensive investigation through feature distribution analysis

arxiv url: http://arxiv.org/abs/2601.14774v1
Date: Wed, 21 Jan 2026 08:53:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-22 21:27:50.297476
Title: Does medical specialization of VLMs enhance discriminative power?: A comprehensive investigation through feature distribution analysis
Title（参考訳）: VLMの医療専門化は差別力を高めるか:特徴分布分析による包括的調査
Authors: Keita Takeda, Tomoya Sakai,
Abstract要約: 本研究では,オープンソース医療ビジョン言語モデル(VLM)が生成する特徴表現について検討する。実験の結果,医療用VLMは,医療用分類作業に有効な識別的特徴を抽出できることがわかった。以上の結果から,医療用VLMの開発において,テキストエンコーダの強化は医用画像の集中的な訓練よりも重要であることが示唆された。
参考スコア（独自算出の注目度）: 2.243145970857166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study investigates the feature representations produced by publicly available open source medical vision-language models (VLMs). While medical VLMs are expected to capture diagnostically relevant features, their learned representations remain underexplored, and standard evaluations like classification accuracy do not fully reveal if they acquire truly discriminative, lesion-specific features. Understanding these representations is crucial for revealing medical image structures and improving downstream tasks in medical image analysis. This study aims to investigate the feature distributions learned by medical VLMs and evaluate the impact of medical specialization. We analyze the feature distribution of multiple image modalities extracted by some representative medical VLMs across lesion classification datasets on multiple modalities. These distributions were compared them with non-medical VLMs to assess the domain-specific medical training. Our experiments showed that medical VLMs can extract discriminative features that are effective for medical classification tasks. Moreover, it was found that non-medical VLMs with recent improvement with contextual enrichment such as LLM2CLIP produce more refined feature representations. Our results imply that enhancing text encoder is more crucial than training intensively on medical images when developing medical VLMs. Notably, non-medical models are particularly vulnerable to biases introduced by overlaied text strings on images. These findings underscore the need for careful consideration on model selection according to downstream tasks besides potential risks in inference due to background biases such as textual information in images.
Abstract（参考訳）: 本研究では,オープンソース医療ビジョン言語モデル(VLM)による特徴表現について検討した。医学的なVLMは診断に関連のある特徴を捉えることが期待されているが、それらの学習された表現は未発見のままであり、分類精度のような標準的な評価は、真に差別的で病変特異的な特徴を得るかどうかを完全には明らかにしない。これらの表現を理解することは、医用画像構造を明らかにし、医用画像解析における下流タスクを改善するために重要である。本研究は, 医療用VLMが学習した特徴分布について検討し, 医療専門化の効果を評価することを目的とする。我々は,複数のモダリティに基づく病変分類データセットを用いて,いくつかの代表的な医用VLMから抽出した複数の画像モダリティの特徴分布を解析した。これらの分布は、ドメイン固有の医療訓練を評価するために、非医療用VLMと比較された。実験の結果,医療用VLMは,医療用分類作業に有効な識別的特徴を抽出できることがわかった。さらに,近年のLLM2CLIPのようなコンテキスト拡張による非医療用VLMでは,より洗練された特徴表現が得られている。以上の結果から,医療用VLMの開発において,テキストエンコーダの強化は医用画像の集中的な訓練よりも重要であることが示唆された。特に、非医療モデルは、画像上のオーバーレイテキスト文字列によって導入されたバイアスに特に脆弱である。これらの結果は、画像中のテキスト情報などの背景バイアスによる推論の潜在的なリスクに加えて、下流タスクによるモデル選択に対する慎重な検討の必要性を浮き彫りにした。

論文の概要: Does medical specialization of VLMs enhance discriminative power?: A comprehensive investigation through feature distribution analysis

関連論文リスト