Fugu-MT 論文翻訳(概要): An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity

論文の概要: An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity

arxiv url: http://arxiv.org/abs/2509.13375v1
Date: Tue, 16 Sep 2025 06:11:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-18 18:41:50.574172
Title: An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity
Title（参考訳）: VLMを用いたOOD検出の実証分析:メカニズム,アドバンテージ,感度
Authors: Yuxiao Lee, Xiaofeng Cao, Wei Ye, Jiangchao Yao, Jingkuan Song, Heng Tao Shen,
Abstract要約: VLM (Vision-Language Models) は、卓越したゼロショット・アウト・オブ・ディストリビューション(OOD)検出能力を示した。 In-distribution (ID) と OOD プロンプトを用いた VLM を用いた OOD 検出の系統的実験的検討を行った。
参考スコア（独自算出の注目度）: 104.05991573442805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs), such as CLIP, have demonstrated remarkable zero-shot out-of-distribution (OOD) detection capabilities, vital for reliable AI systems. Despite this promising capability, a comprehensive understanding of (1) why they work so effectively, (2) what advantages do they have over single-modal methods, and (3) how is their behavioral robustness -- remains notably incomplete within the research community. This paper presents a systematic empirical analysis of VLM-based OOD detection using in-distribution (ID) and OOD prompts. (1) Mechanisms: We systematically characterize and formalize key operational properties within the VLM embedding space that facilitate zero-shot OOD detection. (2) Advantages: We empirically quantify the superiority of these models over established single-modal approaches, attributing this distinct advantage to the VLM's capacity to leverage rich semantic novelty. (3) Sensitivity: We uncovers a significant and previously under-explored asymmetry in their robustness profile: while exhibiting resilience to common image noise, these VLM-based methods are highly sensitive to prompt phrasing. Our findings contribute a more structured understanding of the strengths and critical vulnerabilities inherent in VLM-based OOD detection, offering crucial, empirically-grounded guidance for developing more robust and reliable future designs.
Abstract（参考訳）: CLIPのようなVLM(Vision-Language Models)は、信頼性の高いAIシステムに不可欠なOOD(zero-shot out-of-distribution)検出機能を示している。この有望な能力にもかかわらず、(1)効果的に働く理由の包括的な理解、(2)単一モーダルな方法よりもどのような利点があるのか、(3)その行動の堅牢性は、研究コミュニティの中で顕著に不完全である。 In-distribution (ID) と OOD プロンプトを用いた VLM を用いた OOD 検出の系統的実験的検討を行った。 1) 機構: ゼロショットOOD検出を容易にするVLM埋め込み空間において, 重要な操作特性を体系的に特徴付け, 定式化する。 2) 優位性: 確立された単一モーダルアプローチよりもこれらのモデルの優位性を実証的に定量化し, リッチなセマンティックノベルティを活用するVLMの能力に, この明確な優位性をもたらす。 (3) 感度: 強靭性プロファイルにおいて, 有意かつ未探索な非対称性を明らかにする: 一般的な画像雑音に対する耐性を示す一方で, これらのVLMに基づく手法は, 高速な表現に非常に敏感である。我々の研究は、VLMによるOOD検出に固有の強度と致命的な脆弱性のより構造化された理解に寄与し、より堅牢で信頼性の高い将来の設計を開発する上で、極めて重要かつ実証的な指針を提供する。

論文の概要: An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity

関連論文リスト