Related papers: Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model within 0.5K Parameters

Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model within 0.5K Parameters

URL: http://arxiv.org/abs/2312.10813v2
Date: Thu, 11 Jan 2024 12:51:12 GMT
Title: Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model within 0.5K Parameters
Authors: Tianxiang Hao, Mengyao Lyu, Hui Chen, Sicheng Zhao, Jungong Han, Guiguang Ding
Abstract summary: We develop a new type of prompt, Re- parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation. On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters.
Score: 75.28536311904489
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the development of large pre-trained vision-language models, how to effectively transfer the knowledge of such foundational models to downstream tasks becomes a hot topic, especially in a data-deficient scenario. Recently, prompt tuning has become a popular solution. When adapting the vision-language models, researchers freeze the parameters in the backbone and only design and tune the prompts. On the one hand, the delicate design of prompt tuning exhibits strong performance. On the other hand, complicated structures and update rules largely increase the computation and storage cost. Motivated by the observation that the evolution pattern of the generalization capability in visual-language models aligns harmoniously with the trend of rank variations in the prompt matrix during adaptation, we design a new type of prompt, Re-parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation. Our method could largely reduce the number of tunable parameters and storage space, which is quite beneficial in resource-limited scenarios. Extensive experiments further demonstrate the superiority of RLP. In particular, RLP shows comparable or even stronger performance than the latest state-of-the-art methods with an extremely small number of parameters. On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters.

Related papers

Weight Spectra Induced Efficient Model Adaptation [54.8615621415845]
Fine-tuning large-scale foundation models incurs prohibitive computational costs.<n>We show that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact.<n>We propose a novel method that leverages learnable rescaling of top singular directions.
arXiv Detail & Related papers (2025-05-29T05:03:29Z)
Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs [18.832135309689736]
Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. Recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information. We develop a Position-Aware PAPEFT approach which is composed of a data augmentation technique and an efficient parameter adapter.
arXiv Detail & Related papers (2024-04-01T19:04:17Z)
Density Adaptive Attention is All You Need: Robust Parameter-Efficient Fine-Tuning Across Multiple Modalities [0.9217021281095907]
DAAM integrates learnable mean and variance into its attention mechanism, implemented in a multi-head framework. DAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in speech, image classification, and text classification. We introduce the Importance Factor, a new learning-based metric that enhances the explainability of models trained with DAAM-based methods.
arXiv Detail & Related papers (2024-01-20T06:42:32Z)
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity [6.434967516411846]
We introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect. E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.
arXiv Detail & Related papers (2023-10-24T15:27:15Z)
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time. We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z)
Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z)
HiFi: High-Information Attention Heads Hold for Parameter-Efficient Model Adaptation [0.8409934249521909]
We propose a parameter-efficient fine-tuning method HiFi, that is, only the highly informative and strongly correlated attention heads for the specific task are fine-tuned. We first model the relationship between heads into a graph from two perspectives of information richness and correlation, and then apply PageRank algorithm to determine the relative importance of each head. Experiments on the GLUE benchmark demonstrate the effectiveness of our method, and show that HiFi obtains state-of-the-art performance over the prior baselines.
arXiv Detail & Related papers (2023-05-08T09:31:13Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)
Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation [9.732863739456036]
We propose a novel inducing point design that uses a principled information-theoretic criterion to select inducing points. By choosing inducing points to maximally reduce both global uncertainty and uncertainty in the maximum value of the objective function, we build surrogate models able to support high-precision high- throughput BO.
arXiv Detail & Related papers (2022-06-06T08:56:56Z)
Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU) Most existing continual learning approaches suffer from low accuracy and performance fluctuation. We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.