Related papers: Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

URL: http://arxiv.org/abs/2209.07511v1
Date: Thu, 15 Sep 2022 17:55:11 GMT
Title: Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
Authors: Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao
Abstract summary: We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample. TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
Score: 107.05966685291067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In this work, we propose test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. For image classification, TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample. In evaluating generalization to natural distribution shifts, TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average, surpassing previous prompt tuning approaches that require additional task-specific training data. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data. Project page: https://azshue.github.io/TPT.

Related papers

Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models [3.1099372412393524]
This paper unveils the unnecessary nature of backpropagation in existing methods from a loss landscape perspective. It proposes a simple yet effective framework called Test-time Loss Landscape Adaptation (TLLA) In the prompt tuning stage, a Sharpness-Aware Prompt Tuning (SAPT) method is introduced to identify the training flat minimum. In the test stage, a Sharpness-based Test Sample Selection (STSS) approach is utilized to ensure the alignment of flat minima.
arXiv Detail & Related papers (2025-01-31T03:10:48Z)
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework. We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z)
Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models [4.655740975414312]
This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalizations of large-scale vision-language models (VLMs) TTL offers a test-time-efficient adaptation approach that updates the attention weights of the transformer by maximizing prediction confidence.
arXiv Detail & Related papers (2024-07-22T17:59:19Z)
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z)
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data. Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.