Merge before Forget: A Single LoRA Continual Learning via Continual Merging
- URL: http://arxiv.org/abs/2512.23017v1
- Date: Sun, 28 Dec 2025 17:37:57 GMT
- Title: Merge before Forget: A Single LoRA Continual Learning via Continual Merging
- Authors: Fuli Qiao, Mehrdad Mahdavi,
- Abstract summary: Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting.<n>We propose a novel continual learning method that sequentially merges LoRAs updates into a single unified LoRA.
- Score: 13.950131092976248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient continual learning has emerged as a promising approach for large language models (LLMs) to mitigate catastrophic forgetting while enabling adaptation to new tasks. Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting, typically utilizing these to support new LoRAs learn new tasks. However, these methods not only ignore growing computational memory with tasks and limited storage space but also suffer from potential task interference due to the lack of effective LoRA merging mechanisms. In this paper, we propose a novel continual learning method that orthogonally initializes and sequentially merges LoRAs updates into a single unified LoRA. Our method leverages orthogonal basis extraction from previously learned LoRA to initialize the learning of new tasks, further exploits the intrinsic asymmetry property of LoRA components by using a time-aware scaling mechanism to balance new and old knowledge during continual merging. Our approach maintains constant memory complexity with respect to the number of tasks, minimizes interference between past and new tasks via orthogonal basis initialization, and improves performance over asymmetric LoRA merging via adaptive scaling. We provide theoretical analysis to justify our design and conduct extensive experiments across diverse continual learning benchmarks using various Llama models, demonstrating the effectiveness and efficiency of our method.
Related papers
- Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning [82.30237756328596]
Low-Rank Adaptation (LoRA) has gained increasing attention in Continual Learning (CL)<n>Several LoRA-based CL methods reduce interference across tasks by separating their update spaces.<n>LoDA performs a task-driven decomposition to build general and truly task-specific LoRA subspaces.
arXiv Detail & Related papers (2026-02-27T02:31:00Z) - Shared LoRA Subspaces for almost Strict Continual Learning [32.4267950435704]
Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment.<n>We propose Share, a novel approach to parameter efficient continual finetuning that learns and dynamically updates a single, shared low-rank subspace.<n>A single Share model can replace hundreds of task-specific LoRA adapters, supporting scalable, asynchronous continual learning.
arXiv Detail & Related papers (2026-02-05T18:59:58Z) - Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA [50.97792275353563]
We introduce a novel framework that restructures a single Low-Rank Adaptation (LoRA) module as a decomposable Rank-1 Expert Pool.<n>Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [Guided] token.
arXiv Detail & Related papers (2026-01-30T10:54:51Z) - KeepLoRA: Continual Learning with Residual Gradient Adaptation [70.16296045857659]
Continual learning for pre-trained vision-language models requires balancing three competing objectives.<n>This paper presents a simple but effective approach called KeepLoRA to effectively balance these objectives.
arXiv Detail & Related papers (2026-01-27T14:38:57Z) - LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning [12.165720711684758]
We introduce LiLoRA, a highly efficient architecture expansion method tailored for CVIT in MLLMs.<n>LiLoRA shares the LoRA matrix A across tasks to reduce redundancy, applies an additional low-rank decomposition to matrix B to minimize task-specific parameters, and incorporates a cosine-regularized stability loss to preserve consistency over time.<n>Experiments show that LiLoRA consistently achieves superior performance in sequential task learning while significantly improving parameter efficiency compared to existing approaches.
arXiv Detail & Related papers (2025-08-08T10:32:38Z) - Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [57.514786046966265]
We propose textbfPerturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - LoRA-Based Continual Learning with Constraints on Critical Parameter Changes [7.634417409656999]
LoRA-based continual learning represents a promising avenue for leveraging pre-trained models in downstream continual learning tasks.<n>We propose freezing the most critical parameter matrices in the Vision Transformer (ViT) for pre-tasks before learning post-tasks.<n>Our results indicate that our method achieves state-of-the-art (SOTA) performance on several well-known continual learning benchmarks.
arXiv Detail & Related papers (2025-04-18T02:08:19Z) - C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models [26.560293264523903]
Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that has been extensively applied in areas such as natural language processing and computer vision.<n>We propose Continual Low-Rank Adaptation (C-LoRA), a novel extension of LoRA for continual learning.<n>C-LoRA uses a learnable routing matrix to dynamically manage parameter updates across tasks.
arXiv Detail & Related papers (2025-02-25T07:35:36Z) - SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks.<n>Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks.<n>We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z) - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [26.079123341965687]
We study low-rank learning and analyze how LoRA ranks and placements affect learning and forgetting.<n>A higher-rank LoRA improves task learning (plasticity) but increases forgetting, while a lower-rank LoRA enhances stability but limits adaptation.<n>Motivated by this, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which continually updates PTMs with LoRA adapters of adaptively optimized ranks.
arXiv Detail & Related papers (2024-12-01T23:41:42Z) - Learning Attentional Mixture of LoRAs for Language Model Continual Learning [5.405488709294211]
Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks.
We propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs.
arXiv Detail & Related papers (2024-09-29T08:34:54Z) - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters.
Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.