LoRA ensembles for large language model fine-tuning
- URL: http://arxiv.org/abs/2310.00035v2
- Date: Wed, 4 Oct 2023 19:13:35 GMT
- Title: LoRA ensembles for large language model fine-tuning
- Authors: Xi Wang, Laurence Aitchison, Maja Rudolph
- Abstract summary: Low-Rank Adapters (LoRA) is a parameter-efficient fine-tuning technique.
LoRA represents a very small number of parameters, orders of magnitude less than the underlying pre-trained model.
We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.
- Score: 35.78186948630364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as
overconfidence, poor calibration, and unreliable prediction results on test
data or out-of-distribution samples. One approach commonly used in vision for
alleviating this issue is a deep ensemble, which constructs an ensemble by
training the same model multiple times using different random initializations.
However, there is a huge challenge to ensembling LLMs: the most effective LLMs
are very, very large. Keeping a single LLM in memory is already challenging
enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many
settings. To address these issues, we propose an ensemble approach using
Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique.
Critically, these low-rank adapters represent a very small number of
parameters, orders of magnitude less than the underlying pre-trained model.
Thus, it is possible to construct large ensembles of LoRA adapters with almost
the same computational overhead as using the original model. We find that LoRA
ensembles, applied on its own or on top of pre-existing regularization
techniques, gives consistent improvements in predictive accuracy and
uncertainty quantification.
Related papers
- Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models [13.660511750245245]
This work introduces Bias-Alleviating Low-Rank Adaptation (BA-LoRA), a novel PEFT method designed to counteract bias inheritance.
BA-LoRA incorporates three distinct regularization terms: (1) a consistency regularizer, (2) a diversity regularizer, and (3) a singular value decomposition regularizer.
The results demonstrate that BA-LoRA outperforms LoRA and its state-of-the-art variants.
arXiv Detail & Related papers (2024-08-08T16:13:26Z) - Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance [20.659750151408186]
Large Language Models (LLMs) have demonstrated impressive performance across various domains.
Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA)
We propose Quantized LLMs with Balanced-rank Adaptation (Q-BaRA) and Quantization-Aware Fine-tuning with Higher Rank Adaptation (QA-HiRA)
arXiv Detail & Related papers (2024-07-24T06:16:37Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs [67.38165028487242]
We introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach to fine-tune large language models (LLMs)
Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs.
Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs.
arXiv Detail & Related papers (2023-10-13T07:38:52Z) - LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [104.23434818428062]
We focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model.
We propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework.
Experiments show that our method is highly effective and outperforms existing quantization methods.
arXiv Detail & Related papers (2023-10-12T18:34:08Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [85.02796681773447]
We propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm.
The motivation lies in the imbalanced degrees of freedom of quantization and adaptation.
QA-LoRA is easily implemented with a few lines of code.
arXiv Detail & Related papers (2023-09-26T07:22:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.