PRE: Vision-Language Prompt Learning with Reparameterization Encoder
- URL: http://arxiv.org/abs/2309.07760v2
- Date: Mon, 6 Nov 2023 12:18:25 GMT
- Title: PRE: Vision-Language Prompt Learning with Reparameterization Encoder
- Authors: Anh Pham Thi Minh, An Duc Nguyen, Georgios Tzimiropoulos
- Abstract summary: Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks.
To attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions.
To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens.
- Score: 26.017809323969285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained vision-language models such as CLIP have demonstrated great
potential in zero-shot transferability to downstream tasks. However, to attain
optimal performance, the manual selection of prompts is necessary to improve
alignment between the downstream image distribution and the textual class
descriptions. This manual prompt engineering is the major challenge for
deploying such models in practice since it requires domain expertise and is
extremely time-consuming. To avoid non-trivial prompt engineering, recent work
Context Optimization (CoOp) introduced the concept of prompt learning to the
vision domain using learnable textual tokens. While CoOp can achieve
substantial improvements over manual prompts, its learned context is worse
generalizable to wider unseen classes within the same dataset. In this work, we
present Prompt Learning with Reparameterization Encoder (PRE) - a simple and
efficient method that enhances the generalization ability of the learnable
prompt to unseen classes while maintaining the capacity to learn Base classes.
Instead of directly optimizing the prompts, PRE employs a prompt encoder to
reparameterize the input prompt embeddings, enhancing the exploration of
task-specific knowledge from few-shot samples. Experiments and extensive
ablation studies on 8 benchmarks demonstrate that our approach is an efficient
method for prompt learning. Specifically, PRE achieves a notable enhancement of
5.60% in average accuracy on New classes and 3% in Harmonic mean compared to
CoOp in the 16-shot setting, all achieved within a good training time.
Related papers
- IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning.
IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z) - AAPL: Adding Attributes to Prompt Learning for Vision-Language Models [6.32186874112557]
We propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts.
We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.
arXiv Detail & Related papers (2024-04-25T17:51:10Z) - Visual-Language Prompt Tuning with Knowledge-guided Context Optimization [96.27531485377871]
Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge.
We introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes.
arXiv Detail & Related papers (2023-03-23T14:04:23Z) - TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA)
In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge.
Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z) - CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models.
CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework.
Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z) - LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting.
LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available.
LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z) - Learning to Prompt for Vision-Language Models [82.25005817904027]
Vision-language pre-training has emerged as a promising alternative for representation learning.
It shifts from the tradition of using images and discrete labels for learning a fixed set of weights, seen as visual concepts, to aligning images and raw text for two separate encoders.
Such a paradigm benefits from a broader source of supervision and allows zero-shot transfer to downstream tasks.
arXiv Detail & Related papers (2021-09-02T17:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.