Consistency-guided Prompt Learning for Vision-Language Models
- URL: http://arxiv.org/abs/2306.01195v3
- Date: Tue, 27 Feb 2024 16:40:01 GMT
- Title: Consistency-guided Prompt Learning for Vision-Language Models
- Authors: Shuvendu Roy, Ali Etemad
- Abstract summary: We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models.
Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting.
- Score: 27.75143621836449
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning
method for vision-language models. Our approach improves the generalization of
large foundation models when fine-tuned on downstream tasks in a few-shot
setting. The basic idea of CoPrompt is to enforce a consistency constraint in
the prediction of the trainable and pre-trained models to prevent overfitting
on the downstream task. Additionally, we introduce the following two components
into our consistency constraint to further boost the performance: enforcing
consistency on two perturbed inputs and combining two dominant paradigms of
tuning, prompting and adapter. Enforcing consistency on perturbed input serves
to further regularize the consistency constraint, thereby improving
generalization. Moreover, the integration of adapters and prompts not only
enhances performance on downstream tasks but also offers increased tuning
flexibility in both input and output spaces. This facilitates more effective
adaptation to downstream tasks in a few-shot learning setting. Experiments show
that CoPrompt outperforms existing methods on a range of evaluation suites,
including base-to-novel generalization, domain generalization, and
cross-dataset evaluation. On generalization, CoPrompt improves the
state-of-the-art on zero-shot tasks and the overall harmonic mean over 11
datasets. Detailed ablation studies show the effectiveness of each of the
components in CoPrompt. We make our code available at
https://github.com/ShuvenduRoy/CoPrompt.
Related papers
- IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning.
IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z) - Revisiting the Robust Generalization of Adversarial Prompt Tuning [4.033827046965844]
We propose an adaptive Consistency-guided Adrial Prompt Tuning (i.e., CAPT) framework to enhance the alignment of image and text features for adversarial examples.
We conduct experiments across 14 datasets and 4 data sparsity schemes to show the superiority of CAPT over other state-of-the-art adaption methods.
arXiv Detail & Related papers (2024-05-18T02:54:41Z) - RESTORE: Towards Feature Shift for Vision-Language Prompt Learning [33.13407089704543]
We show that prompt tuning along only one branch of CLIP is the reason why the misalignment occurs.
Without proper regularization across the learnable parameters in different modalities, prompt learning violates the original pre-training constraints.
We propose RESTORE, a multi-modal prompt learning method that exerts explicit constraints on cross-modal consistency.
arXiv Detail & Related papers (2024-03-10T08:52:48Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Generalized Few-Shot Continual Learning with Contrastive Mixture of
Adapters [59.82088750033897]
We set up a Generalized FSCL (GFSCL) protocol involving both class- and domain-incremental situations.
We find that common continual learning methods have poor generalization ability on unseen domains.
In this way, we propose a rehearsal-free framework based on Vision Transformer (ViT) named Contrastive Mixture of Adapters (CMoA)
arXiv Detail & Related papers (2023-02-12T15:18:14Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z) - Conditional Prompt Learning for Vision-Language Models [107.06776396086471]
A recently proposed method named Context Optimization (CoOp) turns context words in a prompt into a set of learnable vectors.
CoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset.
Our experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset.
arXiv Detail & Related papers (2022-03-10T18:59:41Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.