Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models
- URL: http://arxiv.org/abs/2410.16801v1
- Date: Tue, 22 Oct 2024 08:27:23 GMT
- Title: Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models
- Authors: Yuheng Lu, Bingshuo Qian, Caixia Yuan, Huixing Jiang, Xiaojie Wang,
- Abstract summary: Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks.
We propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure.
- Score: 13.56631686493347
- License:
- Abstract: Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks, where adaptation to a new domain leads to a substantial decline in performance on previous tasks. In this paper, we propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure. Aiming to reduce the scale of output change while introduce minimal constraint on model capacity, CLoRA imposes constraint on the direction of updating matrix null space. Experimental results on commonly used LLM finetuning tasks reveal that CLoRA significantly outperforms existing LoRA subsequent methods on both in-domain and outdomain evaluations, highlighting the superority of CLoRA as a effective parameter-efficient finetuning method with catastrophic forgetting mitigating. Further investigation for model parameters indicates that CLoRA effectively balances the trade-off between model capacity and degree of forgetting.
Related papers
- BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.
We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z) - S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising approach to harnessing the power of pre-trained models for sequential tasks.
We propose a Scalable Low-Rank Adaptation (S-LoRA) method for CL (in particular class incremental learning), which incrementally decouples the learning of the direction and magnitude of LoRA parameters.
Our theoretical and empirical analysis demonstrates that S-LoRA tends to follow a low-loss trajectory that converges to an overlapped low-loss region, resulting in an excellent stability-plasticity trade-off in CL.
arXiv Detail & Related papers (2025-01-22T20:00:41Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix.
A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance.
We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach [10.980433187379868]
Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results.
This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension.
arXiv Detail & Related papers (2024-07-16T15:26:31Z) - LoRA Dropout as a Sparsity Regularizer for Overfitting Control [18.992276878667997]
We propose a LoRA Dropout mechanism for the LoRA-based methods.
We show that appropriate sparsity would help tighten the gap between empirical and generalization risks.
arXiv Detail & Related papers (2024-04-15T09:32:12Z) - ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models [8.251547772610301]
We extend the methodology of low-rank adaptation (LoRA) to an innovative approach we call allocating low-rank adaptation (ALoRA)
First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank.
Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks.
arXiv Detail & Related papers (2024-03-24T15:09:55Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.