E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
- URL: http://arxiv.org/abs/2307.13770v1
- Date: Tue, 25 Jul 2023 19:03:21 GMT
- Title: E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
- Authors: Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wenguan Wang, Siyuan
Qi, Dongfang Liu
- Abstract summary: Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
- Score: 55.50908600818483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the size of transformer-based models continues to grow, fine-tuning these
large-scale pretrained vision models for new tasks has become increasingly
parameter-intensive. Parameter-efficient learning has been developed to reduce
the number of tunable parameters during fine-tuning. Although these methods
show promising results, there is still a significant performance gap compared
to full fine-tuning. To address this challenge, we propose an Effective and
Efficient Visual Prompt Tuning (E^2VPT) approach for large-scale
transformer-based model adaptation. Specifically, we introduce a set of
learnable key-value prompts and visual prompts into self-attention and input
layers, respectively, to improve the effectiveness of model fine-tuning.
Moreover, we design a prompt pruning procedure to systematically prune low
importance prompts while preserving model performance, which largely enhances
the model's efficiency. Empirical results demonstrate that our approach
outperforms several state-of-the-art baselines on two benchmarks, with
considerably low parameter usage (e.g., 0.32% of model parameters on VTAB-1k).
Our code is available at https://github.com/ChengHan111/E2VPT.
Related papers
- Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves comparable or even superior performance compared to existing PEFT methods.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z) - Astraios: Parameter-Efficient Instruction Tuning Code Large Language
Models [21.17021844323919]
We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters.
We find that FFT leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale.
arXiv Detail & Related papers (2024-01-01T15:30:19Z) - Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model
within 0.5K Parameters [75.28536311904489]
We develop a new type of prompt, Re- parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation.
On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters.
arXiv Detail & Related papers (2023-12-17T20:42:43Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Prototypical Fine-tuning: Towards Robust Performance Under Varying Data
Sizes [47.880781811936345]
We propose a novel framework for fine-tuning pretrained language models (LM)
Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
arXiv Detail & Related papers (2022-11-24T14:38:08Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.