Chain of LoRA: Efficient Fine-tuning of Language Models via Residual
Learning
- URL: http://arxiv.org/abs/2401.04151v1
- Date: Mon, 8 Jan 2024 14:26:49 GMT
- Title: Chain of LoRA: Efficient Fine-tuning of Language Models via Residual
Learning
- Authors: Wenhan Xia, Chengwei Qin, Elad Hazan
- Abstract summary: We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm.
We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
- Score: 31.036465632204663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning is the primary methodology for tailoring pre-trained large
language models to specific tasks. As the model's scale and the diversity of
tasks expand, parameter-efficient fine-tuning methods are of paramount
importance. One of the most widely used family of methods is low-rank
adaptation (LoRA) and its variants. LoRA encodes weight update as the product
of two low-rank matrices. Despite its advantages, LoRA falls short of
full-parameter fine-tuning in terms of generalization error for certain tasks.
We introduce Chain of LoRA (COLA), an iterative optimization framework
inspired by the Frank-Wolfe algorithm, to bridge the gap between LoRA and full
parameter fine-tuning, without incurring additional computational costs or
memory overheads. COLA employs a residual learning procedure where it merges
learned LoRA modules into the pre-trained language model parameters and
re-initilize optimization for new born LoRA modules. We provide theoretical
convergence guarantees as well as empirical results to validate the
effectiveness of our algorithm. Across various models (OPT and llama-2) and
seven benchmarking tasks, we demonstrate that COLA can consistently outperform
LoRA without additional computational or memory costs.
Related papers
- OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models [0.0]
Low-Rank Adaptation (LoRA) has emerged as a promising method to mitigate these issues.
OLoRA significantly accelerates the convergence of LLM training.
OLoRA exhibits improved performance compared to standard LoRA across a variety of language modeling tasks.
arXiv Detail & Related papers (2024-06-03T20:37:27Z) - LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters [11.23006032094776]
We introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel approach for parameter-efficient fine-tuning.
LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA.
arXiv Detail & Related papers (2024-05-27T19:07:13Z) - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters.
Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z) - Improving LoRA in Privacy-preserving Federated Learning [44.47315926976059]
Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models.
This paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges.
arXiv Detail & Related papers (2024-03-18T23:20:08Z) - ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA)
Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA.
The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z) - DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.
Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z) - LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [104.23434818428062]
We focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model.
We propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework.
Experiments show that our method is highly effective and outperforms existing quantization methods.
arXiv Detail & Related papers (2023-10-12T18:34:08Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning [34.109808214968176]
Generalized LoRA (GLoRA) is an advanced approach for universal parameter-efficient fine-tuning tasks.
It employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations.
GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities.
arXiv Detail & Related papers (2023-06-13T17:59:32Z) - LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs)
LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner.
LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.