Unified Vision and Language Prompt Learning
- URL: http://arxiv.org/abs/2210.07225v1
- Date: Thu, 13 Oct 2022 17:50:24 GMT
- Title: Unified Vision and Language Prompt Learning
- Authors: Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy
- Abstract summary: We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning.
A major finding is that text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances.
To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities.
- Score: 86.1530128487077
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Prompt tuning, a parameter- and data-efficient transfer learning paradigm
that tunes only a small number of parameters in a model's input space, has
become a trend in the vision community since the emergence of large
vision-language models like CLIP. We present a systematic study on two
representative prompt tuning methods, namely text prompt tuning and visual
prompt tuning. A major finding is that none of the unimodal prompt tuning
methods performs consistently well: text prompt tuning fails on data with high
intra-class visual variances while visual prompt tuning cannot handle low
inter-class variances. To combine the best from both worlds, we propose a
simple approach called Unified Prompt Tuning (UPT), which essentially learns a
tiny neural network to jointly optimize prompts across different modalities.
Extensive experiments on over 11 vision datasets show that UPT achieves a
better trade-off than the unimodal counterparts on few-shot learning
benchmarks, as well as on domain generalization benchmarks. Code and models
will be released to facilitate future research.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning.
In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach.
Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z) - Distribution-Aware Prompt Tuning for Vision-Language Models [20.02599087680773]
A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed.
Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models.
Our experiments on 11 benchmark datasets demonstrate that our method significantly improves generalizability.
arXiv Detail & Related papers (2023-09-06T23:49:11Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models [12.397136690734865]
We propose a novel approach called Multi-modal Deep-symphysis Prompt Tuning, dubbed as MuDPT.
MuDPT extends independent multi-modal prompt tuning by learning a model-agnostic transformative network to allow deep hierarchical bi-directional prompt fusion.
Compared with the state-of-the-art methods, MuDPT achieves better recognition and generalization ability with an apparent margin.
arXiv Detail & Related papers (2023-06-20T09:15:52Z) - Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework.
It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way.
GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z) - Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z) - CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models.
CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework.
Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z) - Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model [39.722927180264584]
We propose a novel Dual-modality Prompt Tuning (DPT) paradigm through learning text and visual prompts simultaneously.
To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning scheme is proposed.
arXiv Detail & Related papers (2022-08-17T15:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.