Sparse Low-rank Adaptation of Pre-trained Language Models
- URL: http://arxiv.org/abs/2311.11696v1
- Date: Mon, 20 Nov 2023 11:56:25 GMT
- Title: Sparse Low-rank Adaptation of Pre-trained Language Models
- Authors: Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan
Liu, Maosong Sun
- Abstract summary: We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
- Score: 79.74094517030035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning pre-trained large language models in a parameter-efficient manner
is widely studied for its effectiveness and efficiency. The popular method of
low-rank adaptation (LoRA) offers a notable approach, hypothesizing that the
adaptation process is intrinsically low-dimensional. Although LoRA has
demonstrated commendable performance, it is implemented with a fixed and
unalterable intrinsic rank that might not always be the ideal choice.
Recognizing the need for more flexible adaptation, we extend the methodology of
LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that
enables dynamic adjustments to the intrinsic rank during the adaptation
process. We achieve this through the incorporation of a gate unit optimized
with proximal gradient method in the training stage, controlling the
cardinality of rank under the sparsity of the gate. In the subsequent inference
stage, we eliminate the parameter blocks corresponding to the zeroed-out ranks,
to reduce each SoRA module back to a concise yet rank-optimal LoRA. Our
approach strengthens the representation power of LoRA by initializing it with a
higher rank, while efficiently taming a temporarily increased number of
parameters via updating in a sparse way. We further introduce a sparsifying
scheduler for SoRA, aiming to examine the impact of the number of non-zero
parameters on the model's memorization and generalization. Our experimental
results demonstrate that SoRA can outperform other baselines even with 70%
retained parameters and 70% training time.
Related papers
- Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models [13.56631686493347]
Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks.
We propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure.
arXiv Detail & Related papers (2024-10-22T08:27:23Z) - Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix.
A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance.
We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters.
Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z) - ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models [8.251547772610301]
We extend the methodology of low-rank adaptation (LoRA) to an innovative approach we call allocating low-rank adaptation (ALoRA)
First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank.
Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks.
arXiv Detail & Related papers (2024-03-24T15:09:55Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - IncreLoRA: Incremental Parameter Allocation Method for
Parameter-Efficient Fine-tuning [15.964205804768163]
IncreLoRA is an incremental parameter allocation method that adaptively adds trainable parameters during training.
We conduct extensive experiments on GLUE to demonstrate the effectiveness of IncreLoRA.
arXiv Detail & Related papers (2023-08-23T10:08:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.