Prompt-aligned Gradient for Prompt Tuning
- URL: http://arxiv.org/abs/2205.14865v3
- Date: Wed, 10 Jan 2024 06:24:46 GMT
- Title: Prompt-aligned Gradient for Prompt Tuning
- Authors: Beier Zhu and Yulei Niu and Yucheng Han and Yue Wu and Hanwang Zhang
- Abstract summary: We present Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the general knowledge learned from vision-language models (VLMs)
ProGrad only updates the prompt whose gradient is aligned to the "general direction", which is represented as the gradient of the KL loss of the pre-defined prompt prediction.
Experiments demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods.
- Score: 63.346864107288766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we
can craft a zero-shot classifier by "prompt", e.g., the confidence score of an
image being "[CLASS]" can be obtained by using the VLM provided similarity
measure between the image and the prompt sentence "a photo of a [CLASS]".
Therefore, prompt shows a great potential for fast adaptation of VLMs to
downstream tasks if we fine-tune the prompt-based similarity measure. However,
we find a common failure that improper fine-tuning may not only undermine the
prompt's inherent prediction for the task-related classes, but also for other
classes in the VLM vocabulary. Existing methods still address this problem by
using traditional anti-overfitting techniques such as early stopping and data
augmentation, which lack a principled solution specific to prompt. We present
Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from
forgetting the the general knowledge learned from VLMs. In particular, ProGrad
only updates the prompt whose gradient is aligned (or non-conflicting) to the
"general direction", which is represented as the gradient of the KL loss of the
pre-defined prompt prediction. Extensive experiments demonstrate the stronger
few-shot generalization ability of ProGrad over state-of-the-art prompt tuning
methods. Codes are available at https://github.com/BeierZhu/Prompt-align.
Related papers
- Revisiting Prompt Pretraining of Vision-Language Models [13.888505919946578]
We propose a general framework termed Revisiting Prompt Pretraining (RPP)
RPP targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision.
We additionally utilize soft labels derived from zero-shot probability predictions provided by a pretrained Contrastive Language Image Pretraining (CLIP) teacher model.
arXiv Detail & Related papers (2024-09-10T02:36:13Z) - PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection [59.34973469354926]
This paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD.
For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.
arXiv Detail & Related papers (2024-04-08T06:53:30Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Progressive Visual Prompt Learning with Contrastive Feature Re-formation [15.385630262368661]
We propose a new Progressive Visual Prompt (ProVP) structure to strengthen the interactions among prompts of different layers.
Our ProVP could effectively propagate the image embeddings to deep layers and behave partially similar to an instance adaptive prompt method.
To the best of our knowledge, we are the first to demonstrate the superior performance of visual prompts in V-L models to previous prompt-based methods in downstream tasks.
arXiv Detail & Related papers (2023-04-17T15:54:10Z) - Iterative Prompt Learning for Unsupervised Backlit Image Enhancement [86.90993077000789]
We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT.
We show that the open-world CLIP prior aids in distinguishing between backlit and well-lit images.
Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved.
arXiv Detail & Related papers (2023-03-30T17:37:14Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.