PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation
- URL: http://arxiv.org/abs/2503.06901v2
- Date: Mon, 06 Oct 2025 13:38:48 GMT
- Title: PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation
- Authors: Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, Yiu-ming Cheung,
- Abstract summary: We introduce adaptive distribution optimization (ADO) by tackling two key questions: (1) How to appropriately and formally define ADO, and (2) How to design an adaptive distribution strategy guided by this definition.<n>We propose a new VPT framework termed PRO-VPT, which adaptively adjusts the distribution built upon a nested optimization formulation.<n>Our proposal can adaptively learn the optimal prompt distribution in a nested optimization-based manner, thereby unlocking the full potential of VPT.
- Score: 53.32478229070946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual prompt tuning (VPT), i.e., fine-tuning some lightweight prompt tokens, provides an efficient and effective approach for adapting pre-trained models to various downstream tasks. However, most prior art indiscriminately uses a fixed prompt distribution across different tasks, neglecting the importance of each block varying depending on the task. In this paper, we introduce adaptive distribution optimization (ADO) by tackling two key questions: (1) How to appropriately and formally define ADO, and (2) How to design an adaptive distribution strategy guided by this definition? Through empirical analysis, we first confirm that properly adjusting the distribution significantly improves VPT performance, and further uncover a key insight that a nested relationship exists between ADO and VPT. Based on these findings, we propose a new VPT framework, termed PRO-VPT (iterative Prompt RelOcation-based VPT), which adaptively adjusts the distribution built upon a nested optimization formulation. Specifically, we develop a prompt relocation strategy derived from this formulation, comprising two steps: pruning idle prompts from prompt-saturated blocks, followed by allocating these prompts to the most prompt-needed blocks. By iteratively performing prompt relocation and VPT, our proposal can adaptively learn the optimal prompt distribution in a nested optimization-based manner, thereby unlocking the full potential of VPT. Extensive experiments demonstrate that our proposal significantly outperforms advanced VPT methods, e.g., PRO-VPT surpasses VPT by 1.6 pp and 2.0 pp average accuracy, leading prompt-based methods to state-of-the-art performance on VTAB-1k and FGVC benchmarks. The code is available at https://github.com/ckshang/PRO-VPT.
Related papers
- RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling [59.088798018184235]
textbfRAPO++ is a cross-stage prompt optimization framework.<n>It unifies training-data-aligned refinement, test-time iterative scaling, and large language model fine-tuning.<n> RAPO++ achieves significant gains in semantic alignment, compositional reasoning, temporal stability, and physical plausibility.
arXiv Detail & Related papers (2025-10-23T04:45:09Z) - Visual Instance-aware Prompt Tuning [21.538712755298413]
Visual Prompt Tuning (VPT) has emerged as a parameter-efficient fine-tuning paradigm for vision transformers.<n>We propose Visual Instance-aware Prompt Tuning (ViaPT), which generates instance-aware prompts based on each individual input.<n>ViaPT overcomes limitations by balancing dataset-level and instance-level knowledge, while reducing the amount of learnable parameters.
arXiv Detail & Related papers (2025-07-10T14:23:15Z) - Attention to the Burstiness in Visual Prompt Tuning! [10.857651069130979]
Visual Prompt Tuning (VPT) is a fune-tuning technique that adapts a pre-trained vision Transformer (ViT) by learning a small set of parameters in the input space, known as prompts.<n>In VPT, we uncover burstiness'' in the values arising from the interaction of image patch embeddings.<n>We propose whitening data, de-correlating them and equalizing their variance towards more Gaussian before learning prompts.
arXiv Detail & Related papers (2025-06-28T14:45:37Z) - SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition [69.58329995485158]
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance.<n>We propose a novel method to realize seamless adaptation of foundation models to VPR.<n>In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
arXiv Detail & Related papers (2025-02-23T15:01:09Z) - Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning [27.703316805290843]
Visual Prompt Tuning (VPT) has emerged as a powerful method for adapting pre-trained vision models to downstream tasks.<n>We propose Visual Adaptive Prompt Tuning (VAPT), a new generation of prompts that redefines prompts as adaptive functions of the input.<n>Our theoretical analysis shows that VAPT achieves optimal sample efficiency.
arXiv Detail & Related papers (2025-01-31T07:41:06Z) - CVPT: Cross Visual Prompt Tuning [15.642102189777072]
Cross Visual Prompt Tuning (CVPT) is a cross-attention module to model interactions between prompts and image tokens.<n>CVPT achieves over 4% higher average accuracy, rivaling leading adapter-based methods in both performance and efficiency.<n>Our work confirms that prompt-based methods can achieve exceptional results in visual fine-tuning.
arXiv Detail & Related papers (2024-08-27T11:07:19Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema [36.65009632307124]
We propose Free-from Instruction-oriented Prompt Optimization (FIPO) to improve task performance of large language models (LLMs)
FIPO uses a modular APO template that dynamically integrate the naive task instruction, optional instruction responses, and optional ground truth to produce finely optimized prompts.
We validate FIPO framework across five public benchmarks and six testing models.
arXiv Detail & Related papers (2024-02-19T03:56:44Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - Facing the Elephant in the Room: Visual Prompt Tuning or Full
Finetuning? [92.23438255540968]
Visual Prompt Tuning is a parameter-efficient transfer learning technique.
We conduct a comprehensive analysis across 19 distinct datasets and tasks.
Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.
arXiv Detail & Related papers (2024-01-23T16:48:18Z) - AutoVP: An Automated Visual Prompting Framework and Benchmark [66.5618543577204]
Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach to adapting pre-trained vision models to solve various downstream image-classification tasks.
We propose AutoVP, an end-to-end expandable framework for automating VP design choices, along with 12 downstream image-classification tasks.
Our experimental results show that AutoVP outperforms the best-known current VP methods by a substantial margin.
arXiv Detail & Related papers (2023-10-12T14:55:31Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.