Related papers: Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning

Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning

URL: http://arxiv.org/abs/2501.18936v2
Date: Tue, 04 Feb 2025 05:30:35 GMT
Title: Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning
Authors: Minh Le, Anh Nguyen, Huy Nguyen, Chau Nguyen, Nhat Ho,
Abstract summary: Visual Prompt Tuning (VPT) has emerged as a powerful method for adapting pre-trained vision models to downstream tasks.<n>We propose Visual Adaptive Prompt Tuning (VAPT), a new generation of prompts that redefines prompts as adaptive functions of the input.<n>Our theoretical analysis shows that VAPT achieves optimal sample efficiency.
Score: 27.703316805290843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Prompt Tuning (VPT) has recently emerged as a powerful method for adapting pre-trained vision models to downstream tasks. By introducing learnable prompt tokens as task-specific instructions, VPT effectively guides pre-trained transformer models with minimal overhead. Despite its empirical success, a comprehensive theoretical understanding of VPT remains an active area of research. Building on recent insights into the connection between mixture of experts and prompt-based approaches, we identify a key limitation in VPT: the restricted functional expressiveness in prompt formulation. To address this limitation, we propose Visual Adaptive Prompt Tuning (VAPT), a new generation of prompts that redefines prompts as adaptive functions of the input. Our theoretical analysis shows that this simple yet intuitive approach achieves optimal sample efficiency. Empirical results on VTAB-1K and FGVC further demonstrate VAPT's effectiveness, with performance gains of 7.34% and 1.04% over fully fine-tuning baselines, respectively. Notably, VAPT also surpasses VPT by a substantial margin while using fewer parameters. These results highlight both the effectiveness and efficiency of our method and pave the way for future research to explore the potential of adaptive prompts.

Related papers

Visual Instance-aware Prompt Tuning [21.538712755298413]
Visual Prompt Tuning (VPT) has emerged as a parameter-efficient fine-tuning paradigm for vision transformers.<n>We propose Visual Instance-aware Prompt Tuning (ViaPT), which generates instance-aware prompts based on each individual input.<n>ViaPT overcomes limitations by balancing dataset-level and instance-level knowledge, while reducing the amount of learnable parameters.
arXiv Detail & Related papers (2025-07-10T14:23:15Z)
Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt Tuning [31.84894613827193]
We propose PRO-VPT (iterative Prompt RelOcation-based VPT), which adaptively adjusts the distribution building upon a nested optimization formulation. Pro-VPT surpasses VPT by 1.6% average accuracy, leading prompt-based methods to state-of-the-art performance on the VTAB-1k benchmark.
arXiv Detail & Related papers (2025-03-10T04:07:43Z)
Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information. Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z)
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts [36.88984387787463]
We study the theoretical foundations of prompt-based techniques for fine-tuning large pre-trained models.<n>We show that re parameterization is not merely an engineering trick but is grounded in deep theoretical foundations.<n>Our findings provide theoretical and empirical contributions, advancing the understanding of prompt-based methods.
arXiv Detail & Related papers (2024-10-03T04:30:24Z)
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task [15.642102189777072]
Cross Visual Prompt Tuning is a new type of visual fine-tuning. CVPT calculates cross-attention between the prompt tokens and the embedded tokens, which allows us to compute the semantic relationship between them. CVPT significantly improves VPT's performance and efficiency in visual tasks.
arXiv Detail & Related papers (2024-08-27T11:07:19Z)
DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection [52.100335904875614]
We present a novel prompt tuning approach, namely, Decomposed Context Optimization (DeCoOp), which introduces new-class detectors and sub-classifiers to further enhance the base-class and new-class discriminability. Experimental results on 11 benchmark datasets validate the effectiveness of DePT and demonstrate that DeCoOp outperforms current state-of-the-art methods, providing a significant 2% average accuracy improvement.
arXiv Detail & Related papers (2024-06-01T07:46:42Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z)
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark [97.8968058408759]
Pre-trained vision models (PVMs) have demonstrated remarkable adaptability across a wide range of downstream vision tasks.<n>As these models scale to billions or even trillions of parameters, conventional full fine-tuning has become increasingly impractical due to its high computational and storage demands.<n> parameter-efficient fine-tuning (PEFT) has emerged as a promising alternative, aiming to achieve performance comparable to full fine-tuning while making minimal adjustments to the model parameters.
arXiv Detail & Related papers (2024-02-03T19:12:20Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Do We Really Need a Large Number of Visual Prompts? [23.85637456240694]
We analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. We propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts.
arXiv Detail & Related papers (2023-05-26T19:31:57Z)
Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.