Related papers: Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

URL: http://arxiv.org/abs/2401.12902v1
Date: Tue, 23 Jan 2024 16:48:18 GMT
Title: Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?
Authors: Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan Qi, Dongfang Liu
Abstract summary: Visual Prompt Tuning is a parameter-efficient transfer learning technique. We conduct a comprehensive analysis across 19 distinct datasets and tasks. Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.
Score: 92.23438255540968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning. However, the conditions favoring VPT (the ``when") and the underlying rationale (the ``why") remain unclear. In this paper, we conduct a comprehensive analysis across 19 distinct datasets and tasks. To understand the ``when" aspect, we identify the scenarios where VPT proves favorable by two dimensions: task objectives and data distributions. We find that VPT is preferrable when there is 1) a substantial disparity between the original and the downstream task objectives (e.g., transitioning from classification to counting), or 2) a similarity in data distributions between the two tasks (e.g., both involve natural images). In exploring the ``why" dimension, our results indicate VPT's success cannot be attributed solely to overfitting and optimization considerations. The unique way VPT preserves original features and adds parameters appears to be a pivotal factor. Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.

Related papers

Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning [27.703316805290843]
Visual Prompt Tuning (VPT) has emerged as a powerful method for adapting pre-trained vision models to downstream tasks. We propose Visual Adaptive Prompt Tuning (VAPT), a new generation of prompts that redefines prompts as adaptive functions of the input. Our theoretical analysis shows that VAPT achieves optimal sample efficiency.
arXiv Detail & Related papers (2025-01-31T07:41:06Z)
Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-Tuning [13.384550074613717]
Visual Prompt Tuning is noted for its superior performance compared to full fine-tuning. Ship significantly improves performance, achieving a 4.9% gain in accuracy over VPT with a ViT-B/16 backbone on VTAB-1k tasks.
arXiv Detail & Related papers (2024-12-22T10:28:52Z)
Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information. Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z)
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task [15.642102189777072]
Cross Visual Prompt Tuning is a new type of visual fine-tuning. CVPT calculates cross-attention between the prompt tokens and the embedded tokens, which allows us to compute the semantic relationship between them. CVPT significantly improves VPT's performance and efficiency in visual tasks.
arXiv Detail & Related papers (2024-08-27T11:07:19Z)
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks. We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors. Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z)
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning. VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence. On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z)
Denoising Vision Transformers [43.03068202384091]
We propose a two-stage denoising approach, termed Denoising Vision Transformers (DVT) In the first stage, we separate the clean features from those contaminated by positional artifacts by enforcing cross-view feature consistency with neural fields on a per-image basis. In the second stage, we train a lightweight transformer block to predict clean features from raw ViT outputs, leveraging the derived estimates of the clean features as supervision.
arXiv Detail & Related papers (2024-01-05T18:59:52Z)
Exploring Efficient Few-shot Adaptation for Vision Transformers [70.91692521825405]
We propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the Few-shot Learning tasks. Key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA) We conduct extensive experiments to show the efficacy of our model.
arXiv Detail & Related papers (2023-01-06T08:42:05Z)
Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.