Related papers: Ultra-Light Test-Time Adaptation for Vision--Language Models

Ultra-Light Test-Time Adaptation for Vision--Language Models

URL: http://arxiv.org/abs/2511.09101v1
Date: Thu, 13 Nov 2025 01:31:49 GMT
Title: Ultra-Light Test-Time Adaptation for Vision--Language Models
Authors: Byunghyun Kim,
Abstract summary: Vision-Language Models (VLMs) such as CLIP achieve strong zero-shot recognition by comparing image embeddings to text-derived class prototypes.<n>Under domain shift, they suffer from feature drift, class-prior mismatch, and severe miscalibration.<n>We propose Ultra-Light Test-Time Adaptation (UL-TTA), a fully training-free and backprop-free framework that freezes the backbone and adapts only logit-level parameters.
Score: 0.6816905600359814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language Models (VLMs) such as CLIP achieve strong zero-shot recognition by comparing image embeddings to text-derived class prototypes. However, under domain shift, they suffer from feature drift, class-prior mismatch, and severe miscalibration. Existing test-time adaptation (TTA) methods often require backpropagation through large backbones, covariance estimation, or heavy memory/state, which is problematic for streaming and edge scenarios. We propose Ultra-Light Test-Time Adaptation (UL-TTA), a fully training-free and backprop-free framework that freezes the backbone and adapts only logit-level parameters: class prototypes, class priors, and temperature. UL-TTA performs an online EM-style procedure with (i) selective sample filtering to use only confident predictions, (ii) closed-form Bayesian updates for prototypes and priors anchored by text and Dirichlet priors, (iii) decoupled temperatures for prediction vs. calibration, and (iv) lightweight guards (norm clipping, prior KL constraints, smoothed temperature) to prevent drift in long streams. Across large-scale cross-domain and OOD benchmarks (PACS, Office-Home, DomainNet, Terra Incognita, ImageNet-R/A/V2/Sketch; ~726K test samples) and strong TTA baselines including Tent, T3A, CoTTA, SAR, Tip-Adapter, and FreeTTA, UL-TTA consistently improves top-1 accuracy (e.g., +4.7 points over zero-shot CLIP on average) while reducing ECE by 20-30%, with less than 8% latency overhead. Long-stream experiments up to 200K samples show no collapse. Our results demonstrate that logit-level Bayesian adaptation is sufficient to obtain state-of-the-art accuracy-calibration trade-offs for VLMs under domain shift, without updating any backbone parameters.

Related papers

FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation [9.28697795097814]
Test-Time Adaptation is essential for enabling deep learning models to handle real-world data distribution shifts.<n>Backpropagation-based methods are not suitable for low-end deployment devices.<n>We propose Forward-Only Zeroth-Order Optimization (FOZO), a novel and practical backpropagation-free paradigm for TTA.
arXiv Detail & Related papers (2026-03-05T02:12:48Z)
Test time training enhances in-context learning of nonlinear functions [51.56484100374058]
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
arXiv Detail & Related papers (2025-09-30T03:56:44Z)
Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization [85.50560211492898]
Test-time adaptation (TTA) may fail to improve or even harm the model performance when test data have mixed distribution shifts.<n>This is often a key obstacle preventing existing TTA methods from being deployed in the real world.<n>We propose a sharpness-aware and reliable entropy minimization method, called SAR, for stabilizing TTA from two aspects.
arXiv Detail & Related papers (2025-09-05T10:03:00Z)
BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis [41.09181390655176]
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but degrade significantly under textittemporally evolving distribution shifts common in real-world scenarios.<n>We formalize this practical problem as textitContinual-Temporal Test-Time Adaptation (CT-TTA), where test distributions evolve gradually over time.<n>We propose textitBayesTTA, a Bayesian adaptation framework that enforces temporally consistent predictions and dynamically aligns visual representations.
arXiv Detail & Related papers (2025-07-11T14:02:54Z)
Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models [3.1099372412393524]
This paper unveils the unnecessary nature of backpropagation in existing methods from a loss landscape perspective.<n>It proposes a simple yet effective framework called Test-time Loss Landscape Adaptation (TLLA)<n>In the prompt tuning stage, a Sharpness-Aware Prompt Tuning (SAPT) method is introduced to identify the training flat minimum.<n>In the test stage, a Sharpness-based Test Sample Selection (STSS) approach is utilized to ensure the alignment of flat minima.
arXiv Detail & Related papers (2025-01-31T03:10:48Z)
Enhancing Test Time Adaptation with Few-shot Guidance [62.49199492255226]
Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data.<n>Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data.<n>We develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA.
arXiv Detail & Related papers (2024-09-02T15:50:48Z)
Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models [4.655740975414312]
This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalizations of large-scale vision-language models (VLMs) TTL offers a test-time-efficient adaptation approach that updates the attention weights of the transformer by maximizing prediction confidence.
arXiv Detail & Related papers (2024-07-22T17:59:19Z)
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [65.21599711087538]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.<n>Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.<n>We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z)
REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation [5.749155230209001]
Fully-test-time adaptation (F-TTA) can mitigate performance loss due to distribution shifts between train and test data. We present a general framework for improving robustness of F-TTA to noisy samples, inspired by self-paced learning and robust loss functions.
arXiv Detail & Related papers (2023-09-07T18:44:58Z)
Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data. We propose an active sample selection criterion to identify reliable and non-redundant samples. We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z)
Tent: Fully Test-time Adaptation by Entropy Minimization [77.85911673550851]
A model must adapt itself to generalize to new and different data during testing. In this setting of fully test-time adaptation the model has only the test data and its own parameters. We propose to adapt by test entropy minimization (tent): we optimize the model for confidence as measured by the entropy of its predictions.
arXiv Detail & Related papers (2020-06-18T17:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.