Self-regulating Prompts: Foundational Model Adaptation without
Forgetting
- URL: http://arxiv.org/abs/2307.06948v2
- Date: Thu, 24 Aug 2023 16:56:59 GMT
- Title: Self-regulating Prompts: Foundational Model Adaptation without
Forgetting
- Authors: Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman
Khan, Ming-Hsuan Yang and Fahad Shahbaz Khan
- Abstract summary: We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
- Score: 112.66832145320434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt learning has emerged as an efficient alternative for fine-tuning
foundational models, such as CLIP, for various downstream tasks. Conventionally
trained using the task-specific objective, i.e., cross-entropy loss, prompts
tend to overfit downstream data distributions and find it challenging to
capture task-agnostic general features from the frozen CLIP. This leads to the
loss of the model's original generalization capability. To address this issue,
our work introduces a self-regularization framework for prompting called
PromptSRC (Prompting with Self-regulating Constraints). PromptSRC guides the
prompts to optimize for both task-specific and task-agnostic general
representations using a three-pronged approach by: (a) regulating prompted
representations via mutual agreement maximization with the frozen model, (b)
regulating with self-ensemble of prompts over the training trajectory to encode
their complementary strengths, and (c) regulating with textual diversity to
mitigate sample diversity imbalance with the visual branch. To the best of our
knowledge, this is the first regularization framework for prompt learning that
avoids overfitting by jointly attending to pre-trained model features, the
training trajectory during prompting, and the textual diversity. PromptSRC
explicitly steers the prompts to learn a representation space that maximizes
performance on downstream tasks without compromising CLIP generalization. We
perform extensive experiments on 4 benchmarks where PromptSRC overall performs
favorably well compared to the existing methods. Our code and pre-trained
models are publicly available at: https://github.com/muzairkhattak/PromptSRC.
Related papers
- Revisiting Prompt Pretraining of Vision-Language Models [13.888505919946578]
We propose a general framework termed Revisiting Prompt Pretraining (RPP)
RPP targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision.
We additionally utilize soft labels derived from zero-shot probability predictions provided by a pretrained Contrastive Language Image Pretraining (CLIP) teacher model.
arXiv Detail & Related papers (2024-09-10T02:36:13Z) - Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - RESTORE: Towards Feature Shift for Vision-Language Prompt Learning [33.13407089704543]
We show that prompt tuning along only one branch of CLIP is the reason why the misalignment occurs.
Without proper regularization across the learnable parameters in different modalities, prompt learning violates the original pre-training constraints.
We propose RESTORE, a multi-modal prompt learning method that exerts explicit constraints on cross-modal consistency.
arXiv Detail & Related papers (2024-03-10T08:52:48Z) - Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model [86.9619638550683]
Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data.
However, these models display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of decision shortcuts''
arXiv Detail & Related papers (2024-03-01T09:01:53Z) - Any-Shift Prompting for Generalization over Distributions [66.29237565901734]
We propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning.
Within this framework, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-language model from training to any test distribution.
The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time.
arXiv Detail & Related papers (2024-02-15T16:53:42Z) - Consistency-guided Prompt Learning for Vision-Language Models [23.4909421082857]
We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models.
Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting.
arXiv Detail & Related papers (2023-06-01T23:20:47Z) - Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg)
ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning.
We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.