Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models
- URL: http://arxiv.org/abs/2303.06571v2
- Date: Thu, 17 Aug 2023 08:58:00 GMT
- Title: Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models
- Authors: Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang,
Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang
- Abstract summary: We introduce a novel Gradient-RegulAted Meta-prompt learning framework.
It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way.
GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
- Score: 137.74524357614285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning, a recently emerging paradigm, enables the powerful
vision-language pre-training models to adapt to downstream tasks in a parameter
-- and data -- efficient way, by learning the ``soft prompts'' to condition
frozen pre-training models. Though effective, it is particularly problematic in
the few-shot scenario, where prompt tuning performance is sensitive to the
initialization and requires a time-consuming process to find a good
initialization, thus restricting the fast adaptation ability of the
pre-training models. In addition, prompt tuning could undermine the
generalizability of the pre-training models, because the learnable prompt
tokens are easy to overfit to the limited training samples. To address these
issues, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM)
framework that jointly meta-learns an efficient soft prompt initialization for
better adaptation and a lightweight gradient regulating function for strong
cross-domain generalizability in a meta-learning paradigm using only the
unlabeled image-text pre-training data. Rather than designing a specific prompt
tuning method, our GRAM can be easily incorporated into various prompt tuning
methods in a model-agnostic way, and comprehensive experiments show that GRAM
brings about consistent improvement for them in several settings (i.e.,
few-shot learning, cross-domain generalization, cross-dataset generalization,
etc.) over 11 datasets. Further, experiments show that GRAM enables the
orthogonal methods of textual and visual prompt tuning to work in a
mutually-enhanced way, offering better generalizability beyond the uni-modal
prompt tuning methods.
Related papers
- Context-Aware Multimodal Pretraining [72.04020920042574]
We show that vision-language models can be trained to exhibit significantly increased few-shot adaptation.
We find up to four-fold improvements in test-time sample efficiency, and average few-shot adaptation gains of over 5%.
arXiv Detail & Related papers (2024-11-22T17:55:39Z) - Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL [29.01858866450715]
We present RLPrompt, which aims to find optimal prompt tokens leveraging soft Q-learning.
While the results show promise, we have observed that the prompts frequently appear unnatural, which impedes their interpretability.
We address this limitation by using sparse Tsallis entropy regularization, a principled approach to filtering out unlikely tokens from consideration.
arXiv Detail & Related papers (2024-07-20T03:10:19Z) - Read-only Prompt Optimization for Vision-Language Few-shot Learning [20.66798356082751]
Learnable prompts can affect the internal representation within the self-attention module.
We propose a novel approach, Read-only Prompt Optimization (RPO)
Our experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization.
arXiv Detail & Related papers (2023-08-29T01:22:30Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters.
We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z) - Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization
for Few-shot Generalization [40.45470744120691]
Self-sUpervised meta-Prompt learning framework with MEta-gradient Regularization for few-shot generalization (SUPMER)
This paper proposes a novel Self-sUpervised meta-Prompt learning framework with MEta-gradient Regularization for few-shot generalization (SUPMER)
arXiv Detail & Related papers (2023-03-22T05:04:21Z) - Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z) - Unified Vision and Language Prompt Learning [86.1530128487077]
We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning.
A major finding is that text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances.
To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities.
arXiv Detail & Related papers (2022-10-13T17:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.