Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning
- URL: http://arxiv.org/abs/2309.06123v1
- Date: Tue, 12 Sep 2023 10:47:37 GMT
- Title: Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning
- Authors: Chunqing Ruan, Hongjian Wang
- Abstract summary: We propose a Dynamic Visual Prompt Tuning framework (DVPT), which can generate a dynamic instance-wise token for each image.
In this way, it can capture the unique visual feature of each image, which can be more suitable for downstream visual tasks.
Experiments on a wide range of downstream recognition tasks show that DVPT achieves superior performance than other PETL methods.
- Score: 0.8430481660019451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter efficient transfer learning (PETL) is an emerging research spot
that aims to adapt large-scale pre-trained models to downstream tasks. Recent
advances have achieved great success in saving storage and computation costs.
However, these methods do not take into account instance-specific visual clues
for visual tasks. In this paper, we propose a Dynamic Visual Prompt Tuning
framework (DVPT), which can generate a dynamic instance-wise token for each
image. In this way, it can capture the unique visual feature of each image,
which can be more suitable for downstream visual tasks. We designed a Meta-Net
module that can generate learnable prompts based on each image, thereby
capturing dynamic instance-wise visual features. Extensive experiments on a
wide range of downstream recognition tasks show that DVPT achieves superior
performance than other PETL methods. More importantly, DVPT even outperforms
full fine-tuning on 17 out of 19 downstream tasks while maintaining high
parameter efficiency. Our code will be released soon.
Related papers
- Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model [83.85856356798531]
VistaLLM is a visual system that addresses coarse- and fine-grained vision-language tasks.
It employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences.
We also introduce a novel task, AttCoSeg, which boosts the model's reasoning and grounding capability over multiple input images.
arXiv Detail & Related papers (2023-12-19T18:53:01Z) - MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene
Classification [15.780372479483235]
PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models.
We propose the Meta Visual Prompt Tuning (MVP) method, which updates only the newly added prompt parameters while keeping the pre-trained backbone frozen.
We introduce a novel data augmentation strategy based on patch embedding recombination to enhance the representation and diversity of scenes for classification purposes.
arXiv Detail & Related papers (2023-09-17T13:51:05Z) - Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs.
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP.
Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z) - Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances.
Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks.
We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z) - Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off.
An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z) - Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model [39.722927180264584]
We propose a novel Dual-modality Prompt Tuning (DPT) paradigm through learning text and visual prompts simultaneously.
To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning scheme is proposed.
arXiv Detail & Related papers (2022-08-17T15:06:36Z) - Pro-tuning: Unified Prompt Tuning for Vision Tasks [133.12978197265596]
Fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks.
In this work, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
arXiv Detail & Related papers (2022-07-28T21:09:31Z) - Parameter-Efficient Image-to-Video Transfer Learning [66.82811235484607]
Large pre-trained models for various downstream tasks of interest have recently emerged with promising performance.
Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes costly in terms of model training and storage.
We propose a new Spatio-Adapter for parameter-efficient fine-tuning per video task.
arXiv Detail & Related papers (2022-06-27T18:02:29Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.