A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models
- URL: http://arxiv.org/abs/2405.14977v2
- Date: Mon, 9 Sep 2024 17:33:57 GMT
- Title: A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models
- Authors: Mario Döbler, Robert A. Marsden, Tobias Raichle, Bin Yang,
- Abstract summary: In deep learning, maintaining robustness against distribution shifts is critical.
This work explores a broad range of possibilities to adapt vision-language foundation models at test-time.
- Score: 3.0495235326282186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep learning, maintaining model robustness against distribution shifts is critical. This work explores a broad range of possibilities to adapt vision-language foundation models at test-time, with a particular emphasis on CLIP and its variants. The study systematically examines prompt-based techniques and existing test-time adaptation methods, aiming to improve the robustness under distribution shift in diverse real-world scenarios. Specifically, the investigation covers various prompt engineering strategies, including handcrafted prompts, prompt ensembles, and prompt learning techniques. Additionally, we introduce a vision-text-space ensemble that substantially enhances average performance compared to text-space-only ensembles. Since online test-time adaptation has shown to be effective to mitigate performance drops under distribution shift, the study extends its scope to evaluate the effectiveness of existing test-time adaptation methods that were originally designed for vision-only classification models. Through extensive experimental evaluations conducted across multiple datasets and diverse model architectures, the research demonstrates the effectiveness of these adaptation strategies. Code is available at: https://github.com/mariodoebler/test-time-adaptation
Related papers
- Beyond Model Adaptation at Test Time: A Survey [43.03129492126422]
Machine learning algorithms struggle when samples in the test distribution start to deviate from the ones observed during training.
Test-time adaptation combines the benefits of domain adaptation and domain generalization by training models only on source data.
We provide a comprehensive and systematic review on test-time adaptation, covering more than 400 recent papers.
arXiv Detail & Related papers (2024-11-06T06:13:57Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment [0.23020018305241333]
This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts.
The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies.
arXiv Detail & Related papers (2024-07-01T20:25:20Z) - BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models [20.88680592729709]
We propose a novel backpropagation-free algorithm BaFTA for test-time adaptation of vision-language models.
BaFTA directly estimates class centroids using online clustering within a projected embedding space.
We demonstrate that BaFTA consistently outperforms state-of-the-art test-time adaptation methods in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-06-17T08:16:24Z) - Effectiveness of Vision Language Models for Open-world Single Image Test Time Adaptation [15.621092104244003]
We propose a novel framework to address the real-world challenging task of Single Image Test Time Adaptation.
We leverage large scale Vision Language Models like CLIP to enable real time adaptation on a per-image basis.
The proposed framework ROSITA combines these components, enabling continuous online adaptation of Vision Language Models.
arXiv Detail & Related papers (2024-06-01T16:21:42Z) - Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models [29.75562085178755]
We present a systematic study to understand the roles of key components in retrieval-augmented adaptation.
We unveil new insights on uni-modal and cross-modal retrieval and highlight the critical role of logit ensemble for effective adaptation.
arXiv Detail & Related papers (2024-05-02T16:59:05Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.