Pro-tuning: Unified Prompt Tuning for Vision Tasks
- URL: http://arxiv.org/abs/2207.14381v1
- Date: Thu, 28 Jul 2022 21:09:31 GMT
- Title: Pro-tuning: Unified Prompt Tuning for Vision Tasks
- Authors: Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo,
Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan
- Abstract summary: Fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks.
In this work, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
- Score: 133.12978197265596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In computer vision, fine-tuning is the de-facto approach to leverage
pre-trained vision models to perform downstream tasks. However, deploying it in
practice is quite challenging, due to adopting parameter inefficient global
update and heavily relying on high-quality downstream data. Recently,
prompt-based learning, which adds a task-relevant prompt to adapt the
downstream tasks to pre-trained models, has drastically boosted the performance
of many natural language downstream tasks. In this work, we extend this notable
transfer ability benefited from prompt into vision models as an alternative to
fine-tuning. To this end, we propose parameter-efficient Prompt tuning
(Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
The key to Pro-tuning is prompt-based tuning, i.e., learning task-specific
vision prompts for downstream input images with the pre-trained model frozen.
By only training a few additional parameters, it can work on diverse CNN-based
and Transformer-based architectures. Extensive experiments evidence that
Pro-tuning outperforms fine-tuning in a broad range of vision tasks and
scenarios, including image classification (generic objects, class imbalance,
image corruption, adversarial robustness, and out-of-distribution
generalization), and dense prediction tasks such as object detection and
semantic segmentation.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Visual Tuning [143.43997336384126]
Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks.
Recent advances can achieve superior performance than full-tuning the whole pre-trained parameters.
This survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of work and models.
arXiv Detail & Related papers (2023-05-10T11:26:36Z) - Rethinking Visual Prompt Learning as Masked Visual Token Modeling [106.71983630652323]
We propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction.
VPTM is the first visual prompt method on the generative pre-trained visual model, which achieves consistency between pre-training and downstream visual classification by task reformulation.
arXiv Detail & Related papers (2023-03-09T02:43:10Z) - Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters.
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z) - Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off.
An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z) - Parameter-Efficient Image-to-Video Transfer Learning [66.82811235484607]
Large pre-trained models for various downstream tasks of interest have recently emerged with promising performance.
Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes costly in terms of model training and storage.
We propose a new Spatio-Adapter for parameter-efficient fine-tuning per video task.
arXiv Detail & Related papers (2022-06-27T18:02:29Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.