Related papers: Visual Instance-aware Prompt Tuning

Visual Instance-aware Prompt Tuning

URL: http://arxiv.org/abs/2507.07796v1
Date: Thu, 10 Jul 2025 14:23:15 GMT
Title: Visual Instance-aware Prompt Tuning
Authors: Xi Xiao, Yunbei Zhang, Xingjian Li, Tianyang Wang, Xiao Wang, Yuxiang Wei, Jihun Hamm, Min Xu,
Abstract summary: Visual Prompt Tuning (VPT) has emerged as a parameter-efficient fine-tuning paradigm for vision transformers.<n>We propose Visual Instance-aware Prompt Tuning (ViaPT), which generates instance-aware prompts based on each individual input.<n>ViaPT overcomes limitations by balancing dataset-level and instance-level knowledge, while reducing the amount of learnable parameters.
Score: 21.538712755298413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Prompt Tuning (VPT) has emerged as a parameter-efficient fine-tuning paradigm for vision transformers, with conventional approaches utilizing dataset-level prompts that remain the same across all input instances. We observe that this strategy results in sub-optimal performance due to high variance in downstream datasets. To address this challenge, we propose Visual Instance-aware Prompt Tuning (ViaPT), which generates instance-aware prompts based on each individual input and fuses them with dataset-level prompts, leveraging Principal Component Analysis (PCA) to retain important prompting information. Moreover, we reveal that VPT-Deep and VPT-Shallow represent two corner cases based on a conceptual understanding, in which they fail to effectively capture instance-specific information, while random dimension reduction on prompts only yields performance between the two extremes. Instead, ViaPT overcomes these limitations by balancing dataset-level and instance-level knowledge, while reducing the amount of learnable parameters compared to VPT-Deep. Extensive experiments across 34 diverse datasets demonstrate that our method consistently outperforms state-of-the-art baselines, establishing a new paradigm for analyzing and optimizing visual prompts for vision transformers.

Related papers

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers [13.964106147449051]
We leverage metric learning techniques to investigate how the distribution of prompts affects fine-tuning performance.<n>We propose a novel framework, Distribution Aware Visual Prompt Tuning (DA-VPT), to guide the distributions of the prompts.<n>Our method demonstrates that the prompts can serve as an effective bridge to share semantic information between image patches and the class token.
arXiv Detail & Related papers (2025-05-29T17:31:26Z)
Visual Variational Autoencoder Prompt Tuning [20.387933505896388]
This paper introduces V$2$APT (Visual Variational Autoencoder Prompt Tuning), a novel framework that generates dynamic, input-dependent prompts.<n>Experiments on FGVC, HTA, and VTAB-1k benchmarks demonstrate that our approach consistently outperforms state-of-the-art PEFT methods.
arXiv Detail & Related papers (2025-03-22T04:59:51Z)
On the Expressiveness of Visual Prompt Experts [27.283335463524576]
Visual Prompt Tuning (VPT) has proven effective for parameter-efficient adaptation of pre-trained vision models to downstream tasks by inserting task-specific learnable prompt tokens.<n>We propose Visual Adaptive Prompt Tuning (VAPT), a novel method that endows prompt experts with enhanced expressiveness while preserving parameter efficiency.
arXiv Detail & Related papers (2025-01-31T07:41:06Z)
Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information. Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z)
CVPT: Cross Visual Prompt Tuning [15.642102189777072]
Cross Visual Prompt Tuning (CVPT) is a cross-attention module to model interactions between prompts and image tokens.<n>CVPT achieves over 4% higher average accuracy, rivaling leading adapter-based methods in both performance and efficiency.<n>Our work confirms that prompt-based methods can achieve exceptional results in visual fine-tuning.
arXiv Detail & Related papers (2024-08-27T11:07:19Z)
Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning? [92.23438255540968]
Visual Prompt Tuning is a parameter-efficient transfer learning technique. We conduct a comprehensive analysis across 19 distinct datasets and tasks. Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.
arXiv Detail & Related papers (2024-01-23T16:48:18Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Explicit Visual Prompting for Low-Level Structure Segmentations [55.51869354956533]
We propose a new visual prompting model, named Explicit Visual Prompting (EVP) EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters. EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:01:53Z)
Diversity-Aware Meta Visual Prompting [111.75306320834629]
We present Diversity-Aware Meta Visual Prompting(DAM-VP), an efficient prompting method for transferring pre-trained models to downstream tasks with frozen backbone. We cluster the downstream dataset into small subsets in a diversity-strapped way, with each subset has its own prompt separately. All the prompts are optimized with a meta-prompt, which is learned across several datasets.
arXiv Detail & Related papers (2023-03-14T17:59:59Z)
Unified Vision and Language Prompt Learning [86.1530128487077]
We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning. A major finding is that text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances. To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities.
arXiv Detail & Related papers (2022-10-13T17:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.