Related papers: The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

URL: http://arxiv.org/abs/2506.24000v1
Date: Mon, 30 Jun 2025 16:05:55 GMT
Title: The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models
Authors: Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan,
Abstract summary: TTA-VLM is a comprehensive benchmark for evaluating TTA methods on vision-language models.<n>Our benchmark implements 8 episodic TTA and 7 online TTA methods within a unified and reproducible framework.<n>We extend the evaluation to SigLIP--a model trained with a Sigmoid loss--and include training-time tuning methods such as CoOp, MaPLe, and TeCoA to assess generality.
Score: 120.42853706967188
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time adaptation (TTA) methods have gained significant attention for enhancing the performance of vision-language models (VLMs) such as CLIP during inference, without requiring additional labeled data. However, current TTA researches generally suffer from major limitations such as duplication of baseline results, limited evaluation metrics, inconsistent experimental settings, and insufficient analysis. These problems hinder fair comparisons between TTA methods and obscure their practical strengths and weaknesses. To address these challenges, we introduce TTA-VLM, a comprehensive benchmark for evaluating TTA methods on VLMs. Our benchmark implements 8 episodic TTA and 7 online TTA methods within a unified and reproducible framework, and evaluates them across 15 widely used datasets. Unlike prior studies focused solely on CLIP, we extend the evaluation to SigLIP--a model trained with a Sigmoid loss--and include training-time tuning methods such as CoOp, MaPLe, and TeCoA to assess generality. Beyond classification accuracy, TTA-VLM incorporates various evaluation metrics, including robustness, calibration, out-of-distribution detection, and stability, enabling a more holistic assessment of TTA methods. Through extensive experiments, we find that 1) existing TTA methods produce limited gains compared to the previous pioneering work; 2) current TTA methods exhibit poor collaboration with training-time fine-tuning methods; 3) accuracy gains frequently come at the cost of reduced model trustworthiness. We release TTA-VLM to provide fair comparison and comprehensive evaluation of TTA methods for VLMs, and we hope it encourages the community to develop more reliable and generalizable TTA strategies.

Related papers

Test-Time Adaptation with Binary Feedback [50.20923012663613]
BiTTA is a novel dual-path optimization framework that balances binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions.<n> Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-24T05:24:10Z)
Test-time Correlation Alignment [2.389598109913754]
Test-Time Adaptation (TTA) adapts using only unlabeled test data.<n>Test-time Correlation Alignment (TCA) can enhance test performances with a theoretical guarantee.<n> LinearTCA applies a simple linear transformation to achieve both instance and correlation alignment without additional model updates.<n> LinearTCA+ serves as a plug-and-play module that can easily boost existing TTA methods.
arXiv Detail & Related papers (2025-05-01T13:59:13Z)
Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z)
Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation [14.112999441288615]
Test-time adaptation (TTA) adapts pre-trained models during inference using unlabeled test data. There is still a significant performance gap between the TTA approaches and their supervised counterparts. We propose ATASeg framework, which consists of two parts, i.e., model adapter and label annotator.
arXiv Detail & Related papers (2023-12-04T12:16:02Z)
Persistent Test-time Adaptation in Recurring Testing Scenarios [12.024233973321756]
Current test-time adaptation (TTA) approaches aim to adapt a machine learning model to environments that change continuously. Yet, it is unclear whether TTA methods can maintain their adaptability over prolonged periods. We propose persistent TTA (PeTTA) which senses when the model is diverging towards collapse and adjusts the adaptation strategy.
arXiv Detail & Related papers (2023-11-30T02:24:44Z)
From Question to Exploration: Test-Time Adaptation in Semantic Segmentation? [21.27237423511349]
Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to test data with potential distribution shifts. We investigate the applicability of existing classic TTA strategies in semantic segmentation.
arXiv Detail & Related papers (2023-10-09T01:59:49Z)
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line [65.14099135546594]
Recent test-time adaptation (TTA) methods drastically strengthen the ACL and AGL trends in models, even in shifts where models showed very weak correlations before. Our results show that by combining TTA with AGL-based estimation methods, we can estimate the OOD performance of models with high precision for a broader set of distribution shifts.
arXiv Detail & Related papers (2023-10-07T23:21:25Z)
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z)
Test-Time Adaptation with Perturbation Consistency Learning [32.58879780726279]
We propose a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts. Our method can achieve higher or comparable performance with less inference time over strong PLM backbones.
arXiv Detail & Related papers (2023-04-25T12:29:22Z)
Evaluation of Test-Time Adaptation Under Computational Time Constraints [80.40939405129102]
Test Time Adaptation (TTA) methods leverage unlabeled data at test time to adapt to distribution shifts. Current evaluation protocols overlook the effect of this extra cost, affecting their real-world applicability. We propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream.
arXiv Detail & Related papers (2023-04-10T18:01:47Z)
Towards Stable Test-Time Adaptation in Dynamic Wild World [60.98073673220025]
Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. Online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world.
arXiv Detail & Related papers (2023-02-24T02:03:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.