LoRA ensembles for large language model fine-tuning
- URL: http://arxiv.org/abs/2310.00035v2
- Date: Wed, 4 Oct 2023 19:13:35 GMT
- Title: LoRA ensembles for large language model fine-tuning
- Authors: Xi Wang, Laurence Aitchison, Maja Rudolph
- Abstract summary: Low-Rank Adapters (LoRA) is a parameter-efficient fine-tuning technique.
LoRA represents a very small number of parameters, orders of magnitude less than the underlying pre-trained model.
We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.
- Score: 35.78186948630364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as
overconfidence, poor calibration, and unreliable prediction results on test
data or out-of-distribution samples. One approach commonly used in vision for
alleviating this issue is a deep ensemble, which constructs an ensemble by
training the same model multiple times using different random initializations.
However, there is a huge challenge to ensembling LLMs: the most effective LLMs
are very, very large. Keeping a single LLM in memory is already challenging
enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many
settings. To address these issues, we propose an ensemble approach using
Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique.
Critically, these low-rank adapters represent a very small number of
parameters, orders of magnitude less than the underlying pre-trained model.
Thus, it is possible to construct large ensembles of LoRA adapters with almost
the same computational overhead as using the original model. We find that LoRA
ensembles, applied on its own or on top of pre-existing regularization
techniques, gives consistent improvements in predictive accuracy and
uncertainty quantification.
Related papers
- How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? [55.33467849079774]
Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of Large Language Models.
We investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge.
arXiv Detail & Related papers (2025-02-20T12:31:03Z) - Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.
In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z) - CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization [2.975939846457057]
Fine-tuning large language models (LLMs) using low-rank adaptation (LoRA) has become a highly efficient approach for downstream tasks.
Applying LoRA techniques to quantized LLMs poses unique challenges due to the reduced representational precision of quantized weights.
We introduce CLoQ, a simplistic initialization strategy designed to overcome these challenges.
arXiv Detail & Related papers (2025-01-30T16:48:15Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models [13.660511750245245]
This work introduces Bias-Alleviating Low-Rank Adaptation (BA-LoRA), a novel PEFT method designed to counteract bias inheritance.
BA-LoRA incorporates three distinct regularization terms: (1) a consistency regularizer, (2) a diversity regularizer, and (3) a singular value decomposition regularizer.
The results demonstrate that BA-LoRA outperforms LoRA and its state-of-the-art variants.
arXiv Detail & Related papers (2024-08-08T16:13:26Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [104.23434818428062]
We focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model.
We propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework.
Experiments show that our method is highly effective and outperforms existing quantization methods.
arXiv Detail & Related papers (2023-10-12T18:34:08Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [85.02796681773447]
We propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm.
The motivation lies in the imbalanced degrees of freedom of quantization and adaptation.
QA-LoRA is easily implemented with a few lines of code.
arXiv Detail & Related papers (2023-09-26T07:22:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.