LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
- URL: http://arxiv.org/abs/2502.00896v2
- Date: Tue, 04 Feb 2025 03:36:34 GMT
- Title: LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
- Authors: Can Jin, Ying Li, Mingyu Zhao, Shiyu Zhao, Zhenting Wang, Xiaoxiao He, Ligong Han, Tong Che, Dimitris N. Metaxas,
- Abstract summary: We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP)
LoR-VP enables shared and patch-specific information across rows and columns of image pixels.
Experiments demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods.
- Score: 41.77434289193232
- License:
- Abstract: Visual prompting has gained popularity as a method for adapting pre-trained models to specific tasks, particularly in the realm of parameter-efficient tuning. However, existing visual prompting techniques often pad the prompt parameters around the image, limiting the interaction between the visual prompts and the original image to a small set of patches while neglecting the inductive bias present in shared information across different patches. In this study, we conduct a thorough preliminary investigation to identify and address these limitations. We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP), which enables shared and patch-specific information across rows and columns of image pixels. Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods, achieving up to 6 times faster training times, utilizing 18 times fewer visual prompt parameters, and delivering a 3.1% improvement in performance. The code is available as https://github.com/jincan333/LoR-VP.
Related papers
- Selective Visual Prompting in Vision Mamba [35.86547398432339]
Pre-trained Vision Mamba (Vim) models have demonstrated exceptional performance across various computer vision tasks.
Existing visual prompting methods are predominantly tailored for Vision Transformer (ViT)-based models.
We introduce a novel Selective Visual Prompting (SVP) method specifically for the efficient fine-tuning of Vim.
arXiv Detail & Related papers (2024-12-12T05:24:06Z) - Attention Prompting on Image for Large Vision-Language Models [63.794304207664176]
We propose a new prompting technique named Attention Prompting on Image.
We generate an attention heatmap for the input image dependent on the text query with an auxiliary model like CLIP.
Experiments on various vison-language benchmarks verify the effectiveness of our technique.
arXiv Detail & Related papers (2024-09-25T17:59:13Z) - Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs.
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP.
Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z) - Do We Really Need a Large Number of Visual Prompts? [23.85637456240694]
We analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture.
We propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts.
arXiv Detail & Related papers (2023-05-26T19:31:57Z) - Progressive Visual Prompt Learning with Contrastive Feature Re-formation [15.385630262368661]
We propose a new Progressive Visual Prompt (ProVP) structure to strengthen the interactions among prompts of different layers.
Our ProVP could effectively propagate the image embeddings to deep layers and behave partially similar to an instance adaptive prompt method.
To the best of our knowledge, we are the first to demonstrate the superior performance of visual prompts in V-L models to previous prompt-based methods in downstream tasks.
arXiv Detail & Related papers (2023-04-17T15:54:10Z) - Explicit Visual Prompting for Low-Level Structure Segmentations [55.51869354956533]
We propose a new visual prompting model, named Explicit Visual Prompting (EVP)
EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters.
EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:01:53Z) - Unleashing the Power of Visual Prompting At the Pixel Level [28.50538386115006]
We show that the strategy of reconciling the prompt and the image matters, and find that warping the prompt around a properly shrinked image empirically works the best.
Using a CLIP model, our prompting method sets a new record of 82.8% average accuracy across 12 popular classification datasets.
arXiv Detail & Related papers (2022-12-20T18:57:06Z) - CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models.
CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework.
Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z) - Unified Vision and Language Prompt Learning [86.1530128487077]
We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning.
A major finding is that text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances.
To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities.
arXiv Detail & Related papers (2022-10-13T17:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.