Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model
within 0.5K Parameters
- URL: http://arxiv.org/abs/2312.10813v2
- Date: Thu, 11 Jan 2024 12:51:12 GMT
- Title: Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model
within 0.5K Parameters
- Authors: Tianxiang Hao, Mengyao Lyu, Hui Chen, Sicheng Zhao, Jungong Han,
Guiguang Ding
- Abstract summary: We develop a new type of prompt, Re- parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation.
On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters.
- Score: 75.28536311904489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of large pre-trained vision-language models, how to
effectively transfer the knowledge of such foundational models to downstream
tasks becomes a hot topic, especially in a data-deficient scenario. Recently,
prompt tuning has become a popular solution. When adapting the vision-language
models, researchers freeze the parameters in the backbone and only design and
tune the prompts. On the one hand, the delicate design of prompt tuning
exhibits strong performance. On the other hand, complicated structures and
update rules largely increase the computation and storage cost. Motivated by
the observation that the evolution pattern of the generalization capability in
visual-language models aligns harmoniously with the trend of rank variations in
the prompt matrix during adaptation, we design a new type of prompt,
Re-parameterized Low-rank Prompt (RLP), for both efficient and effective
adaptation. Our method could largely reduce the number of tunable parameters
and storage space, which is quite beneficial in resource-limited scenarios.
Extensive experiments further demonstrate the superiority of RLP. In
particular, RLP shows comparable or even stronger performance than the latest
state-of-the-art methods with an extremely small number of parameters. On a
series of tasks over 11 datasets, RLP significantly increases the average
downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K
parameters.
Related papers
- Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach [17.678759882763078]
Fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks.
Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features is a key challenge.
We propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy.
arXiv Detail & Related papers (2024-03-28T00:14:53Z) - Enhancing Transformer RNNs with Multiple Temporal Perspectives [18.884124657093405]
We introduce the concept of multiple temporal perspectives, a novel approach applicable to Recurrent Neural Network (RNN) architectures.
This method involves maintaining diverse temporal views of previously encountered text, significantly enriching the language models' capacity to interpret context.
arXiv Detail & Related papers (2024-02-04T22:12:29Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges.
Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning.
A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z) - Prompt Generation Networks for Input-based Adaptation of Frozen Vision
Transformers [9.080472817672264]
Prompt Generation Network (PGN) generates high performing, input-dependent prompts by sampling from an end-to-end learned library of tokens.
"prompt inversion" trick, with which PGNs can be efficiently trained in a latent space but deployed as strictly input-only prompts for inference.
It surpasses previous methods by a large margin on 12/12 datasets and even outperforms full-finetuning on 5/12, while requiring 100x less parameters.
arXiv Detail & Related papers (2022-10-12T17:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.