Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
- URL: http://arxiv.org/abs/2406.13046v3
- Date: Mon, 28 Oct 2024 17:47:26 GMT
- Title: Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
- Authors: Cristian Meo, Ksenia Sycheva, Anirudh Goyal, Justin Dauwels,
- Abstract summary: It is a common practice in natural language processing to pre-train a single model and then fine-tune it for downstream tasks.
B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix.
B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70%.
- Score: 21.811889512977924
- License:
- Abstract: It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods.
Related papers
- RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation [32.369133126167085]
We propose a new PEFT scheme called DiffoRA, which is theoretically grounded and enables module-wise adoption of LoRA.
At the core of our DiffoRA lies a Differential Adaptation Matrix (DAM) to determine which module is the most suitable and essential for fine-tuning.
Our approach achieves the best model accuracy over all the state-of-the-art baselines across various benchmarks.
arXiv Detail & Related papers (2025-02-13T02:41:34Z) - One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation [13.585425242072173]
Most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA)
We propose to improve LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition (SVD) on minibatches of activation.
We call our new method $textbfE$xplained $textbfV$ariance $textbfA$daptation (EVA)
arXiv Detail & Related papers (2024-10-09T17:59:06Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.
We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.
Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - SARA: Singular-Value Based Adaptive Low-Rank Adaption [4.135688713311511]
LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead.
In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD.
Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA)
arXiv Detail & Related papers (2024-08-06T16:39:42Z) - BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models [34.1111413429869]
BiLoRA is an overfitting-alleviating fine-tuning approach based on bi-level optimization (BLO)
tested on ten datasets covering natural language understanding and generation tasks.
arXiv Detail & Related papers (2024-03-19T14:11:20Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [66.85589263870702]
Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component.
Experiments on finetuning RoBERTa and LLaMA-2 demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines.
arXiv Detail & Related papers (2023-11-20T18:57:41Z) - LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [104.23434818428062]
We focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model.
We propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework.
Experiments show that our method is highly effective and outperforms existing quantization methods.
arXiv Detail & Related papers (2023-10-12T18:34:08Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.