Fugu-MT 論文翻訳(概要): What Drives Test-Time Adaptation for CLIP? A Controlled Empirical Study from an Update Perspective

論文の概要: What Drives Test-Time Adaptation for CLIP? A Controlled Empirical Study from an Update Perspective

arxiv url: http://arxiv.org/abs/2606.14299v1
Date: Fri, 12 Jun 2026 09:35:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 16:00:42.854913
Title: What Drives Test-Time Adaptation for CLIP? A Controlled Empirical Study from an Update Perspective
Title（参考訳）: CLIPのテスト時間適応を駆動するものは何か? 最新の知見から
Authors: Jiazhen Huang, Xiao Chen, Zhiming Liu, Yaru Sun, Jingyan Jiang, Zhi Wang,
Abstract要約: TTA(Test-Time Adaptation)は最近、軽量ソリューションとしてCLIPに拡張されている。我々は、最先端の精度の追求から一歩後退し、TTA4CLIPを体系的に制御した研究を行う。まず,パラメータに基づく手法の駆動要因を抽出し,適応ゲインが主にテストタイムのエビデンスと信頼性プロキシによって駆動されることを明らかにする。
参考スコア（独自算出の注目度）: 8.53292360255317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs) such as CLIP have become a standard backbone for open-vocabulary recognition, yet their zero-shot predictions remain vulnerable to distribution shifts encountered at deployment. Test-Time Adaptation (TTA) has recently been extended to CLIP as a lightweight solution, leading to a rapidly growing body of TTA4CLIP methods. However, empirical progress in this area has largely outpaced our understanding of what truly drives adaptation, where their gains originate, and under which shifts they remain reliable. In this paper, we take a step back from the pursuit of state-of-the-art accuracy and conduct a systematic controlled study of TTA4CLIP. We first organize existing methods into three unified paradigms according to what is updated at test time. We then introduce TTABC, an open-source TTA Benchmark for CLIP, which standardizes evaluation protocols and integrates more than 20 representative methods. Our controlled empirical analysis focuses on three key areas. First, we determine the driving factors in parameter-based methods, revealing that adaptation gains are primarily driven by test-time evidence and reliable proxies rather than heavy optimization. Second, we explore evidence utilization beyond heavy parameter tuning, showing that competitive and efficient performance can be achieved through cross- or current-sample evidence and lightweight prototype updates. Finally, we demonstrate that there is no silver bullet for TTA: no single adaptation paradigm is universally optimal, and the preferred paradigm depends on the nature of shift. We hope our benchmark and study provide a clearer understanding of the current TTA4CLIP landscape and establish a foundation for further research.
Abstract（参考訳）: CLIPのようなビジョンランゲージモデル(VLM)は、オープン語彙認識の標準バックボーンとなっているが、ゼロショット予測は、デプロイ時に発生する分散シフトに対して脆弱なままである。 TTA(Test-Time Adaptation)は先頃,軽量なソリューションとしてCLIPに拡張された。しかし、この領域における実証的な進歩は、本当に適応を促進するもの、そこから得られる利益、そしてどのシフトが信頼できるのかという我々の理解を大きく上回っている。本稿では,最先端の精度の追求から一歩後退し,TTA4CLIPを系統的に制御した研究を行う。まず、既存のメソッドをテスト時に更新したものに従って3つの統一パラダイムにまとめます。次に、評価プロトコルを標準化し、20以上の代表的なメソッドを統合するオープンソースのTTAベンチマークであるTTABCを紹介する。制御された経験分析は3つの重要な領域に焦点を当てている。まず,パラメータに基づく手法の駆動要因を抽出し,適応ゲインが重み付けではなく,テストタイムのエビデンスと信頼性の高いプロキシによって主に駆動されることを明らかにする。第二に、重パラメータチューニング以上のエビデンス利用について検討し、クロスサンプルまたはカレントサンプルのエビデンスと軽量なプロトタイプ更新によって、競争力と効率的なパフォーマンスが達成可能であることを示す。最後に、TTAには銀の弾丸はなく、単一の適応パラダイムが普遍的に最適であり、望ましいパラダイムはシフトの性質に依存しないことを示す。我々は、我々のベンチマークと研究が、現在のTTA4CLIPの展望をより明確に理解し、さらなる研究の基盤を確立することを願っている。

論文の概要: What Drives Test-Time Adaptation for CLIP? A Controlled Empirical Study from an Update Perspective

関連論文リスト