Fugu-MT 論文翻訳(概要): Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models

論文の概要: Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models

arxiv url: http://arxiv.org/abs/2510.02750v1
Date: Fri, 03 Oct 2025 06:27:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.28361
Title: Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models
Title（参考訳）: 視覚言語モデルを用いた物体認識・検出のためのベイズテスト時間適応
Authors: Lihua Zhou, Mao Ye, Shuaifeng Li, Nianxin Li, Jinlin Wu, Xiatian Zhu, Lei Deng, Hongbin Liu, Jiebo Luo, Zhen Lei,
Abstract要約: 我々は、オブジェクト認識と検出の両方のためのTTAのためのトレーニングフリーフレームワークであるBCA+を提案する。我々はベイズ推論問題として適応を定式化し、キャッシュベースの予測で初期VLM出力を融合することで最終的な予測を生成する。 BCA+は、認識と検出のベンチマークの両方で最先端のパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 86.53246292425699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language models (VLMs) such as CLIP and Grounding DINO have achieved remarkable success in object recognition and detection. However, their performance often degrades under real-world distribution shifts. Test-time adaptation (TTA) aims to mitigate this issue by adapting models during inference. Existing methods either rely on computationally expensive backpropagation, which hinders real-time deployment, or focus solely on likelihood adaptation, which overlooks the critical role of the prior. Our prior work, Bayesian Class Adaptation (BCA), addressed these shortcomings for object recognition by introducing a training-free framework that incorporates adaptive priors. Building upon this foundation, we now present Bayesian Class Adaptation plus (BCA+), a unified, training-free framework for TTA for both object recognition and detection. BCA+ introduces a dynamic cache that adaptively stores and updates class embeddings, spatial scales (for detection), and, crucially, adaptive class priors derived from historical predictions. We formulate adaptation as a Bayesian inference problem, where final predictions are generated by fusing the initial VLM output with a cache-based prediction. This cache-based prediction combines a dynamically updated likelihood (measuring feature and scale similarity) and a prior (reflecting the evolving class distribution). This dual-adaptation mechanism, coupled with uncertainty-guided fusion, enables BCA+ to correct both the model's semantic understanding and its contextual confidence. As a training-free method requiring no backpropagation, BCA+ is highly efficient. Extensive experiments demonstrate that BCA+ achieves state-of-the-art performance on both recognition and detection benchmarks.
Abstract（参考訳）: CLIP や Grounding DINO のような視覚言語モデル (VLM) は、オブジェクトの認識と検出において顕著な成功を収めている。しかし、その性能は現実世界の流通シフトで劣化することが多い。テスト時間適応(TTA)は、推論中にモデルを適用することでこの問題を軽減することを目的としている。既存の手法は計算コストのかかるバックプロパゲーションに依存しており、リアルタイムデプロイメントを妨げている。我々の以前の研究であるBayesian Class Adaptation (BCA)は、適応的な事前を組み込んだトレーニング不要のフレームワークを導入することで、オブジェクト認識のこれらの欠点に対処した。この基盤の上に構築されたBayesian Class Adaptation Plus (BCA+)は、オブジェクト認識と検出の両方のためのTTAのための統合されたトレーニング不要のフレームワークである。 BCA+は動的キャッシュを導入し、クラス埋め込み、空間スケール(検出のための)、そして重要なことに、過去の予測から派生した適応型クラス事前を適応的に保存し更新する。我々はベイズ推論問題として適応を定式化し、キャッシュベースの予測で初期VLM出力を融合することで最終的な予測を生成する。このキャッシュベースの予測は、動的に更新される可能性(特徴とスケールの類似性の測定)と事前(進化するクラス分布の反映)を組み合わせたものである。この二重適応機構は不確実性誘導融合と組み合わされ、BCA+はモデルの意味的理解と文脈的信頼の両方を補正することができる。バックプロパゲーションを必要としないトレーニングフリーの方法として、BCA+は極めて効率的である。大規模な実験により、BCA+は認識と検出のベンチマークの両方で最先端のパフォーマンスを達成している。

論文の概要: Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models

関連論文リスト