Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
- URL: http://arxiv.org/abs/2312.10813v3
- Date: Tue, 10 Sep 2024 20:15:55 GMT
- Title: Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
- Authors: Tianxiang Hao, Mengyao Lyu, Hui Chen, Sicheng Zhao, Xiaohan Ding, Jungong Han, Guiguang Ding,
- Abstract summary: We propose the concept of Information Density'' (ID) to indicate whether a matrix strongly belongs to certain feature spaces.
We introduce the Dense Information Prompt (DIP) to enhance information density to improve generalization.
DIP significantly reduces the number of tunable parameters and the requisite storage space, making it particularly advantageous in resource-constrained settings.
- Score: 73.34532767873785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of large pre-trained vision-language models, effectively transferring the knowledge embedded within these foundational models to downstream tasks has become a pivotal topic, particularly in data-scarce environments. Recently, parameter-efficient fine-tuning approaches, especially prompt tuning, have garnered considerable attention. To better understand the nature of prompt tuning, we propose the concept of ``Information Density'' (ID) to indicate whether a matrix strongly belongs to certain feature spaces rather than being evenly distributed across various feature spaces. We suppose a higher ID with strong bias across some feature spaces naturally leads to excellent robustness and stability. Our research, inspired by the observation that generalizability is closely linked to the information density of the prompt matrix, introduces the Dense Information Prompt (DIP). DIP aims to enhance information density to improve generalization. Furthermore, DIP significantly reduces the number of tunable parameters and the requisite storage space, making it particularly advantageous in resource-constrained settings. Comprehensive experiments substantiate the superiority of DIP. Notably, DIP surpasses the latest state-of-the-art methods by a substantial margin with an exceptionally small parameter count. Across a range of tasks spanning 11 datasets, DIP improves the average downstream accuracy of classic prompt tuning by up to 5.76% using merely 0.5K parameters.
Related papers
- Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs [18.832135309689736]
Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts.
Recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information.
We develop a Position-Aware PAPEFT approach which is composed of a data augmentation technique and an efficient parameter adapter.
arXiv Detail & Related papers (2024-04-01T19:04:17Z) - Density Adaptive Attention is All You Need: Robust Parameter-Efficient Fine-Tuning Across Multiple Modalities [0.9217021281095907]
DAAM integrates learnable mean and variance into its attention mechanism, implemented in a multi-head framework.
DAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in speech, image classification, and text classification.
We introduce the Importance Factor, a new learning-based metric that enhances the explainability of models trained with DAAM-based methods.
arXiv Detail & Related papers (2024-01-20T06:42:32Z) - E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity [6.434967516411846]
We introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse.
E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect.
E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.
arXiv Detail & Related papers (2023-10-24T15:27:15Z) - Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time.
We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL.
For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z) - Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z) - HiFi: High-Information Attention Heads Hold for Parameter-Efficient
Model Adaptation [0.8409934249521909]
We propose a parameter-efficient fine-tuning method HiFi, that is, only the highly informative and strongly correlated attention heads for the specific task are fine-tuned.
We first model the relationship between heads into a graph from two perspectives of information richness and correlation, and then apply PageRank algorithm to determine the relative importance of each head.
Experiments on the GLUE benchmark demonstrate the effectiveness of our method, and show that HiFi obtains state-of-the-art performance over the prior baselines.
arXiv Detail & Related papers (2023-05-08T09:31:13Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z) - Information-theoretic Inducing Point Placement for High-throughput
Bayesian Optimisation [9.732863739456036]
We propose a novel inducing point design that uses a principled information-theoretic criterion to select inducing points.
By choosing inducing points to maximally reduce both global uncertainty and uncertainty in the maximum value of the objective function, we build surrogate models able to support high-precision high- throughput BO.
arXiv Detail & Related papers (2022-06-06T08:56:56Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.