Prompting through Prototype: A Prototype-based Prompt Learning on
Pretrained Vision-Language Models
- URL: http://arxiv.org/abs/2210.10841v1
- Date: Wed, 19 Oct 2022 19:13:07 GMT
- Title: Prompting through Prototype: A Prototype-based Prompt Learning on
Pretrained Vision-Language Models
- Authors: Yue Zhang, Hongliang Fei, Dingcheng Li, Tan Yu, Ping Li
- Abstract summary: Recent works have demonstrated that prompt learning is particularly useful for few-shot learning, where there is limited training data.
We develop a prototype-based prompt learning method to overcome the above limitations.
In PTP, the image prototype represents a centroid of a certain image cluster in the latent space and a prompt prototype is defined as a soft prompt in the continuous space.
- Score: 46.02539753821322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt learning is a new learning paradigm which reformulates downstream
tasks as similar pretraining tasks on pretrained models by leveraging textual
prompts. Recent works have demonstrated that prompt learning is particularly
useful for few-shot learning, where there is limited training data. Depending
on the granularity of prompts, those methods can be roughly divided into
task-level prompting and instance-level prompting. Task-level prompting methods
learn one universal prompt for all input samples, which is efficient but
ineffective to capture subtle differences among different classes.
Instance-level prompting methods learn a specific prompt for each input, though
effective but inefficient. In this work, we develop a novel prototype-based
prompt learning method to overcome the above limitations. In particular, we
focus on few-shot image recognition tasks on pretrained vision-language models
(PVLMs) and develop a method of prompting through prototype (PTP), where we
define $K$ image prototypes and $K$ prompt prototypes. In PTP, the image
prototype represents a centroid of a certain image cluster in the latent space
and a prompt prototype is defined as a soft prompt in the continuous space. The
similarity between a query image and an image prototype determines how much
this prediction relies on the corresponding prompt prototype. Hence, in PTP,
similar images will utilize similar prompting ways. Through extensive
experiments on seven real-world benchmarks, we show that PTP is an effective
method to leverage the latent knowledge and adaptive to various PVLMs.
Moreover, through detailed analysis, we discuss pros and cons for prompt
learning and parameter-efficient fine-tuning under the context of few-shot
learning.
Related papers
- Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning [16.613744920566436]
Proto-CLIP is a framework for few-shot learning based on large-scale vision-language models such as CLIP.
Proto-CLIP adapts the image and text encoder embeddings from CLIP in a joint fashion using few-shot examples.
Proto-CLIP has both training-free and fine-tuned variants.
arXiv Detail & Related papers (2023-07-06T15:41:53Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Multi-Prompt with Depth Partitioned Cross-Modal Learning [25.239388488952375]
Partitioned Multi-modal Prompt (PMPO) is a multi-modal prompting technique that extends the soft prompt from a single learnable prompt to multiple prompts.
Our method divides the visual encoder depths and connects learnable prompts to the separated visual depths, enabling different prompts to capture hierarchical contextual depths.
We evaluate the effectiveness of our approach on three challenging tasks: new class generalization, cross-dataset evaluation, and domain generalization.
arXiv Detail & Related papers (2023-05-10T14:54:29Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition [40.329190454146996]
MultimOdal PRototype-ENhanced Network (MORN) uses semantic information of label texts as multimodal information to enhance prototypes.
We conduct extensive experiments on four popular few-shot action recognition datasets.
arXiv Detail & Related papers (2022-12-09T14:24:39Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - IDPG: An Instance-Dependent Prompt Generation Method [58.45110542003139]
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
We propose a conditional prompt generation method to generate prompts for each input instance.
arXiv Detail & Related papers (2022-04-09T15:45:27Z) - Instance-aware Prompt Learning for Language Understanding and Generation [49.22899822734549]
We propose an instance-aware prompt learning method that learns a different prompt for each instance.
Our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.
arXiv Detail & Related papers (2022-01-18T17:03:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.