PVP: Pre-trained Visual Parameter-Efficient Tuning
- URL: http://arxiv.org/abs/2304.13639v1
- Date: Wed, 26 Apr 2023 15:55:29 GMT
- Title: PVP: Pre-trained Visual Parameter-Efficient Tuning
- Authors: Zhao Song, Ke Yang, Naiyang Guan, Junjie Zhu, Peng Qiao, Qingyong Hu
- Abstract summary: Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs.
We propose a Pre-trained Visual.
efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
- Score: 29.05396521860764
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-scale pre-trained transformers have demonstrated remarkable success in
various computer vision tasks. However, it is still highly challenging to fully
fine-tune these models for downstream tasks due to their high computational and
storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques,
e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have
significantly reduced the computation and storage cost by inserting lightweight
prompt modules into the pre-trained models and tuning these prompt modules with
a small number of trainable parameters, while keeping the transformer backbone
frozen. Although only a few parameters need to be adjusted, most PETuning
methods still require a significant amount of downstream task training data to
achieve good results. The performance is inadequate on low-data regimes,
especially when there are only one or two examples per class. To this end, we
first empirically identify the poor performance is mainly due to the
inappropriate way of initializing prompt modules, which has also been verified
in the pre-trained language models. Next, we propose a Pre-trained Visual
Parameter-efficient (PVP) Tuning framework, which pre-trains the
parameter-efficient tuning modules first and then leverages the pre-trained
modules along with the pre-trained transformer backbone to perform
parameter-efficient tuning on downstream tasks. Experiment results on five
Fine-Grained Visual Classification (FGVC) and VTAB-1k datasets demonstrate that
our proposed method significantly outperforms state-of-the-art PETuning
methods.
Related papers
- Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning [19.17362588650503]
Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules.
We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
arXiv Detail & Related papers (2024-02-06T14:03:15Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.