Multitask Vision-Language Prompt Tuning
- URL: http://arxiv.org/abs/2211.11720v2
- Date: Tue, 22 Nov 2022 07:24:16 GMT
- Title: Multitask Vision-Language Prompt Tuning
- Authors: Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E.
Gonzalez, Kurt Keutzer, Trevor Darrell
- Abstract summary: We propose multitask vision-language prompt tuning (MV)
MV incorporates cross-task knowledge into prompt tuning for vision-language models.
Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods.
- Score: 103.5967011236282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt Tuning, conditioning on task-specific learned prompt vectors, has
emerged as a data-efficient and parameter-efficient method for adapting large
pretrained vision-language models to multiple downstream tasks. However,
existing approaches usually consider learning prompt vectors for each task
independently from scratch, thereby failing to exploit the rich shareable
knowledge across different vision-language tasks. In this paper, we propose
multitask vision-language prompt tuning (MVLPT), which incorporates cross-task
knowledge into prompt tuning for vision-language models. Specifically, (i) we
demonstrate the effectiveness of learning a single transferable prompt from
multiple source tasks to initialize the prompt for each target task; (ii) we
show many target tasks can benefit each other from sharing prompt vectors and
thus can be jointly learned via multitask prompt tuning. We benchmark the
proposed MVLPT using three representative prompt tuning methods, namely text
prompt tuning, visual prompt tuning, and the unified vision-language prompt
tuning. Results in 20 vision tasks demonstrate that the proposed approach
outperforms all single-task baseline prompt tuning methods, setting the new
state-of-the-art on the few-shot ELEVATER benchmarks and cross-task
generalization benchmarks. To understand where the cross-task knowledge is most
effective, we also conduct a large-scale study on task transferability with 20
vision tasks in 400 combinations for each prompt tuning method. It shows that
the most performant MVLPT for each prompt tuning method prefers different task
combinations and many tasks can benefit each other, depending on their visual
similarity and label similarity. Code is available at
https://github.com/sIncerass/MVLPT.
Related papers
- Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition [0.0]
Multi-task prompt tuning has garnered considerable attention for its inherent modularity and potential to enhance parameter-efficient transfer learning.
This paper aims to analyze and improve the performance of multiple tasks by facilitating the transfer of knowledge between their corresponding prompts in a multi-task setting.
arXiv Detail & Related papers (2024-08-23T17:01:51Z) - Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model [83.85856356798531]
VistaLLM is a visual system that addresses coarse- and fine-grained vision-language tasks.
It employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences.
We also introduce a novel task, AttCoSeg, which boosts the model's reasoning and grounding capability over multiple input images.
arXiv Detail & Related papers (2023-12-19T18:53:01Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances.
Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks.
We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z) - Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning [43.639430661322585]
We propose multitask prompt tuning (MPT)
MPT learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts.
We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task.
arXiv Detail & Related papers (2023-03-06T03:25:59Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Prompt Tuning with Soft Context Sharing for Vision-Language Models [42.61889428498378]
We propose a novel method to tune pre-trained vision-language models on multiple target few-shot tasks jointly.
We show that SoftCPT significantly outperforms single-task prompt tuning methods.
arXiv Detail & Related papers (2022-08-29T10:19:10Z) - Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient
Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach.
It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged.
It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z) - On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community.
Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.
Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate.
In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks.
For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.