Facing the Elephant in the Room: Visual Prompt Tuning or Full
Finetuning?
- URL: http://arxiv.org/abs/2401.12902v1
- Date: Tue, 23 Jan 2024 16:48:18 GMT
- Title: Facing the Elephant in the Room: Visual Prompt Tuning or Full
Finetuning?
- Authors: Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan
Qi, Dongfang Liu
- Abstract summary: Visual Prompt Tuning is a parameter-efficient transfer learning technique.
We conduct a comprehensive analysis across 19 distinct datasets and tasks.
Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.
- Score: 92.23438255540968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the scale of vision models continues to grow, the emergence of Visual
Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has
gained attention due to its superior performance compared to traditional
full-finetuning. However, the conditions favoring VPT (the ``when") and the
underlying rationale (the ``why") remain unclear. In this paper, we conduct a
comprehensive analysis across 19 distinct datasets and tasks. To understand the
``when" aspect, we identify the scenarios where VPT proves favorable by two
dimensions: task objectives and data distributions. We find that VPT is
preferrable when there is 1) a substantial disparity between the original and
the downstream task objectives (e.g., transitioning from classification to
counting), or 2) a similarity in data distributions between the two tasks
(e.g., both involve natural images). In exploring the ``why" dimension, our
results indicate VPT's success cannot be attributed solely to overfitting and
optimization considerations. The unique way VPT preserves original features and
adds parameters appears to be a pivotal factor. Our study provides insights
into VPT's mechanisms, and offers guidance for its optimal utilization.
Related papers
- Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning [27.703316805290843]
Visual Prompt Tuning (VPT) has emerged as a powerful method for adapting pre-trained vision models to downstream tasks.
We propose Visual Adaptive Prompt Tuning (VAPT), a new generation of prompts that redefines prompts as adaptive functions of the input.
Our theoretical analysis shows that VAPT achieves optimal sample efficiency.
arXiv Detail & Related papers (2025-01-31T07:41:06Z) - Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-Tuning [13.384550074613717]
Visual Prompt Tuning is noted for its superior performance compared to full fine-tuning.
Ship significantly improves performance, achieving a 4.9% gain in accuracy over VPT with a ViT-B/16 backbone on VTAB-1k tasks.
arXiv Detail & Related papers (2024-12-22T10:28:52Z) - Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task [15.642102189777072]
Cross Visual Prompt Tuning is a new type of visual fine-tuning.
CVPT calculates cross-attention between the prompt tokens and the embedded tokens, which allows us to compute the semantic relationship between them.
CVPT significantly improves VPT's performance and efficiency in visual tasks.
arXiv Detail & Related papers (2024-08-27T11:07:19Z) - What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks.
We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors.
Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z) - VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning.
VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence.
On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z) - Exploring Efficient Few-shot Adaptation for Vision Transformers [70.91692521825405]
We propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the Few-shot Learning tasks.
Key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA)
We conduct extensive experiments to show the efficacy of our model.
arXiv Detail & Related papers (2023-01-06T08:42:05Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.