Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
- URL: http://arxiv.org/abs/2403.12952v2
- Date: Tue, 10 Dec 2024 22:53:16 GMT
- Title: Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
- Authors: Elaine Sui, Xiaohan Wang, Serena Yeung-Levy,
- Abstract summary: Test-Time Prototype Shifting (TPS) is a pioneering approach designed to adapt vision-language models to test datasets using unlabeled test inputs.
TPS not only facilitates optimization-free prototype reuse for subsequent predictions but also enables seamless integration with current advancements in prompt engineering.
A notable aspect of our framework is its significantly reduced memory and computational demands when compared to conventional text-prompt tuning methods.
- Score: 19.683461002518147
- License:
- Abstract: Advancements in vision-language models (VLMs) have propelled the field of computer vision, particularly in the zero-shot learning setting. Despite their promise, the effectiveness of these models often diminishes due to domain shifts in test environments. To address this, we introduce the Test-Time Prototype Shifting (TPS) framework, a pioneering approach designed to adapt VLMs to test datasets using unlabeled test inputs. Our method is based on the notion of modulating per-class prototypes in the shared embedding space. By pre-computing and caching prototypes generated with the pre-trained text encoder, TPS not only facilitates optimization-free prototype reuse for subsequent predictions but also enables seamless integration with current advancements in prompt engineering. At test-time, TPS dynamically learns shift vectors for each prototype based solely on the given test sample, effectively bridging the domain gap and enhancing classification accuracy. A notable aspect of our framework is its significantly reduced memory and computational demands when compared to conventional text-prompt tuning methods. Extensive evaluations across 15 image classification datasets involving natural distribution shifts and cross-dataset generalization, as well as in context-dependent visual reasoning, demonstrate TPS's superior performance, achieving state-of-the-art results while reducing resource requirements.
Related papers
- Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models [3.1099372412393524]
This paper unveils the unnecessary nature of backpropagation in existing methods from a loss landscape perspective.
It proposes a simple yet effective framework called Test-time Loss Landscape Adaptation (TLLA)
In the prompt tuning stage, a Sharpness-Aware Prompt Tuning (SAPT) method is introduced to identify the training flat minimum.
In the test stage, a Sharpness-based Test Sample Selection (STSS) approach is utilized to ensure the alignment of flat minima.
arXiv Detail & Related papers (2025-01-31T03:10:48Z) - Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.
We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.
Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - Align Your Prompts: Test-Time Prompting with Distribution Alignment for
Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain.
Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution.
In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts.
We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.