KeepLoRA: Continual Learning with Residual Gradient Adaptation
- URL: http://arxiv.org/abs/2601.19659v1
- Date: Tue, 27 Jan 2026 14:38:57 GMT
- Title: KeepLoRA: Continual Learning with Residual Gradient Adaptation
- Authors: Mao-Lin Luo, Zi-Hao Zhou, Yi-Lin Zhang, Yuanyu Wan, Tong Wei, Min-Ling Zhang,
- Abstract summary: Continual learning for pre-trained vision-language models requires balancing three competing objectives.<n>This paper presents a simple but effective approach called KeepLoRA to effectively balance these objectives.
- Score: 70.16296045857659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents a simple but effective approach called KeepLoRA to effectively balance these objectives. We first analyze the knowledge retention mechanism within the model parameter space and find that general knowledge is mainly encoded in the principal subspace, while task-specific knowledge is encoded in the residual subspace. Motivated by this finding, KeepLoRA learns new tasks by restricting LoRA parameter updates in the residual subspace to prevent interfering with previously learned capabilities. Specifically, we infuse knowledge for a new task by projecting its gradient onto a subspace orthogonal to both the principal subspace of pre-trained model and the dominant directions of previous task features. Our theoretical and empirical analyses confirm that KeepLoRA balances the three objectives and achieves state-of-the-art performance. The implementation code is available at https://github.com/MaolinLuo/KeepLoRA.
Related papers
- Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning [82.30237756328596]
Low-Rank Adaptation (LoRA) has gained increasing attention in Continual Learning (CL)<n>Several LoRA-based CL methods reduce interference across tasks by separating their update spaces.<n>LoDA performs a task-driven decomposition to build general and truly task-specific LoRA subspaces.
arXiv Detail & Related papers (2026-02-27T02:31:00Z) - Shared LoRA Subspaces for almost Strict Continual Learning [32.4267950435704]
Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment.<n>We propose Share, a novel approach to parameter efficient continual finetuning that learns and dynamically updates a single, shared low-rank subspace.<n>A single Share model can replace hundreds of task-specific LoRA adapters, supporting scalable, asynchronous continual learning.
arXiv Detail & Related papers (2026-02-05T18:59:58Z) - Merge before Forget: A Single LoRA Continual Learning via Continual Merging [13.950131092976248]
Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting.<n>We propose a novel continual learning method that sequentially merges LoRAs updates into a single unified LoRA.
arXiv Detail & Related papers (2025-12-28T17:37:57Z) - SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA [12.037328436961431]
Subspace-Constrained LoRA (SC-LoRA) is a novel LoRA framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation.<n>In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting.
arXiv Detail & Related papers (2025-05-29T17:55:21Z) - SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting [68.00007494819798]
Continual learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks.<n> Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two subspaces.<n>New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge.<n>Existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space.
arXiv Detail & Related papers (2025-05-28T13:57:56Z) - Practical Continual Forgetting for Pre-trained Vision Models [61.41125567026638]
In real-world scenarios, selective information is expected to be continuously removed from a pre-trained model.<n>We define this problem as continual forgetting and identify three key challenges.<n>We first propose Group Sparse LoRA (GS-LoRA) to fine-tune the FFN layers in Transformer blocks for each forgetting task.<n>We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that our method manages to forget specific classes with minimal impact on other classes.
arXiv Detail & Related papers (2025-01-16T17:57:53Z) - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [26.079123341965687]
We study low-rank learning and analyze how LoRA ranks and placements affect learning and forgetting.<n>A higher-rank LoRA improves task learning (plasticity) but increases forgetting, while a lower-rank LoRA enhances stability but limits adaptation.<n>Motivated by this, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which continually updates PTMs with LoRA adapters of adaptively optimized ranks.
arXiv Detail & Related papers (2024-12-01T23:41:42Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.