Fugu-MT 論文翻訳(概要): Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

論文の概要: Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

arxiv url: http://arxiv.org/abs/2511.22019v1
Date: Thu, 27 Nov 2025 01:48:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 19:47:55.35206
Title: Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models
Title（参考訳）: 視覚言語モデルにおける不確かさ推定のためのクラス内確率的埋め込み
Authors: Zhenxiang Lin, Maryam Haghighat, Will Browne, Dimity Miller,
Abstract要約: コントラッシブ・ビジョン言語モデルに対するトレーニング不要・ポストホック不確実性推定手法を提案する。我々の手法はVLM非依存であり、微調整を必要とせず、分散シフトに対する堅牢性を示し、1クラスにつき10のトレーニングイメージで効果的に機能する。
参考スコア（独自算出の注目度）: 7.5752750293638735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language models (VLMs), such as CLIP, have gained popularity for their strong open vocabulary classification performance, but they are prone to assigning high confidence scores to misclassifications, limiting their reliability in safety-critical applications. We introduce a training-free, post-hoc uncertainty estimation method for contrastive VLMs that can be used to detect erroneous predictions. The key to our approach is to measure visual feature consistency within a class, using feature projection combined with multivariate Gaussians to create class-specific probabilistic embeddings. Our method is VLM-agnostic, requires no fine-tuning, demonstrates robustness to distribution shift, and works effectively with as few as 10 training images per class. Extensive experiments on ImageNet, Flowers102, Food101, EuroSAT and DTD show state-of-the-art error detection performance, significantly outperforming both deterministic and probabilistic VLM baselines. Code is available at https://github.com/zhenxianglin/ICPE.
Abstract（参考訳）: CLIP(英語版)のような視覚言語モデル(VLM)は、強力なオープン語彙分類性能で人気を集めているが、安全クリティカルなアプリケーションにおける信頼性を制限して、高い信頼度スコアを誤分類に割り当てる傾向にある。本稿では, 誤予測の検出に使用できる対照的なVLMに対して, トレーニング不要, ポストホック不確実性推定手法を提案する。我々のアプローチの鍵は、多変量ガウスアンと組み合わせてクラス固有の確率的埋め込みを作成することで、クラス内の視覚的特徴の一貫性を測定することである。我々の手法はVLM非依存であり、微調整を必要とせず、分散シフトに対する堅牢性を示し、1クラスにつき10のトレーニングイメージで効果的に機能する。 ImageNet、Flowers102、Food101、EuroSAT、DTDの大規模な実験は、最先端のエラー検出性能を示し、決定論的および確率的VLMベースラインを著しく上回っている。コードはhttps://github.com/zhenxianglin/ICPEで入手できる。

論文の概要: Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

関連論文リスト