Related papers: S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning

S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning

URL: http://arxiv.org/abs/2501.13198v2
Date: Thu, 30 Jan 2025 17:55:31 GMT
Title: S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning
Authors: Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, Ying Wei,
Abstract summary: Continual Learning with foundation models has emerged as a promising approach to harnessing the power of pre-trained models for sequential tasks.<n>We propose a Scalable Low-Rank Adaptation (S-LoRA) method for CL (in particular class incremental learning), which incrementally decouples the learning of the direction and magnitude of LoRA parameters.<n>Our theoretical and empirical analysis demonstrates that S-LoRA tends to follow a low-loss trajectory that converges to an overlapped low-loss region, resulting in an excellent stability-plasticity trade-off in CL.
Score: 73.93639228235622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual Learning with foundation models has recently emerged as a promising approach to harnessing the power of pre-trained models for sequential tasks. Existing prompt-based methods generally use a gating mechanism to select relevant prompts aligned with the test query for further processing. However, the success of these methods largely depends on the precision of the gating mechanism, which becomes less scalable with additional computational overhead as tasks increases. To overcome these issues, we propose a Scalable Low-Rank Adaptation (S-LoRA) method for CL (in particular class incremental learning), which incrementally decouples the learning of the direction and magnitude of LoRA parameters. S-LoRA supports efficient inference by employing the last-stage trained model for direct testing without a gating process. Our theoretical and empirical analysis demonstrates that S-LoRA tends to follow a low-loss trajectory that converges to an overlapped low-loss region, resulting in an excellent stability-plasticity trade-off in CL. Furthermore, based on our findings, we develop variants of S-LoRA with further improved scalability. Extensive experiments across multiple CL benchmarks and various foundation models consistently validate the effectiveness of S-LoRA.

Related papers

Reinforcement Learning for LLM Reasoning Under Memory Constraints [0.02488650627593658]
We introduce S-GRPO, a memory-efficient variant of Group Relative Policy Optimization, and T-SPMO, a token-level prefix matching strategy for fine-grained credit assignment. Despite limited resources, when used to fine-tune Qwen2-1.5B both methods significantly improve SVAMP benchmark accuracy from 46% to above 70% using LoRA training. We find that our full-token GRPO baseline under LoRA fine-tuning did not improve model performance (compared to base model) on either task.
arXiv Detail & Related papers (2025-04-29T14:58:43Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods. We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning. Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data. We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z)
Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models [38.97142043836567]
Continual learning (CL) aims to enable vision transformers (ViTs) to learn new tasks over time. catastrophic forgetting remains a persistent challenge. We propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA)
arXiv Detail & Related papers (2024-11-01T14:28:39Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models [13.56631686493347]
Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks. We propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure.
arXiv Detail & Related papers (2024-10-22T08:27:23Z)
Is Parameter Collision Hindering Continual Learning in LLMs? [50.57658782050275]
Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially.<n>We show that building non-collision parameters is a more critical interdependence factor in addressing CL challenges.<n>We propose Non-collision Low-Rank Adaptation (N-LoRA), a simple yet effective approach leveraging low collision rates to enhance CL in LLMs.
arXiv Detail & Related papers (2024-10-14T05:54:11Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
LoRA Dropout as a Sparsity Regularizer for Overfitting Control [18.992276878667997]
We propose a LoRA Dropout mechanism for the LoRA-based methods. We show that appropriate sparsity would help tighten the gap between empirical and generalization risks.
arXiv Detail & Related papers (2024-04-15T09:32:12Z)
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA [45.38491644250814]
Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA) is an intra-layer sharing mechanism. PRoLoRA retains its advantages, and effectively circumvents the drawbacks of peer parameter-sharing methods. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA.
arXiv Detail & Related papers (2024-02-24T13:39:05Z)
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.