Continual Learning Beyond Experience Rehearsal and Full Model Surrogates
- URL: http://arxiv.org/abs/2505.21942v1
- Date: Wed, 28 May 2025 03:52:34 GMT
- Title: Continual Learning Beyond Experience Rehearsal and Full Model Surrogates
- Authors: Prashant Bhat, Laurens Niesten, Elahe Arani, Bahram Zonooz,
- Abstract summary: Continual learning has remained a significant challenge for deep neural networks.<n>Existing solutions often rely on experience rehearsal or full model surrogates to mitigate.<n>We propose a scalable CL approach that eliminates the need for experience rehearsal and full-model surrogates.
- Score: 17.236861687708096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) has remained a significant challenge for deep neural networks as learning new tasks erases previously acquired knowledge, either partially or completely. Existing solutions often rely on experience rehearsal or full model surrogates to mitigate CF. While effective, these approaches introduce substantial memory and computational overhead, limiting their scalability and applicability in real-world scenarios. To address this, we propose SPARC, a scalable CL approach that eliminates the need for experience rehearsal and full-model surrogates. By effectively combining task-specific working memories and task-agnostic semantic memory for cross-task knowledge consolidation, SPARC results in a remarkable parameter efficiency, using only 6% of the parameters required by full-model surrogates. Despite its lightweight design, SPARC achieves superior performance on Seq-TinyImageNet and matches rehearsal-based methods on various CL benchmarks. Additionally, weight re-normalization in the classification layer mitigates task-specific biases, establishing SPARC as a practical and scalable solution for CL under stringent efficiency constraints.
Related papers
- CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation [14.2843647693986]
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for class-incremental semantic segmentation.<n>CLoRA significantly reduces the hardware requirements for training, making it well-suited for CL in resource-constrained environments after deployment.
arXiv Detail & Related papers (2025-07-26T09:36:05Z) - Low-Complexity Inference in Continual Learning via Compressed Knowledge Transfer [5.079602839359523]
Continual learning (CL) aims to train models that can learn a sequence of tasks without forgetting previously acquired knowledge.<n>Recently, large pre-trained models have been widely adopted in CL for their ability to support both.<n>We propose two efficient frameworks tailored for class-incremental learning.
arXiv Detail & Related papers (2025-05-13T08:07:40Z) - LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z) - Continual learning via probabilistic exchangeable sequence modelling [6.269118318460723]
Continual learning (CL) refers to the ability to continuously learn and accumulate new knowledge while retaining useful information from past experiences.<n>We propose CL-BRUNO, a probabilistic, Neural Process-based CL model that performs scalable and tractable Bayesian update and prediction.
arXiv Detail & Related papers (2025-03-26T17:08:20Z) - DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning [22.386864304549285]
Continual learning (CL) is essential for Large Language Models (LLMs) to adapt to evolving real-world demands.<n>Recent rehearsal-free methods employ model-based and regularization-based strategies to address this issue.<n>We propose a $textbfD$e $textbfA$ttention-based $textbfTask $textbfA$daptation ( DATA)<n> DATA explicitly decouples and learns both task-specific and task-shared knowledge using high-rank and low-rank task adapters.
arXiv Detail & Related papers (2025-02-17T06:35:42Z) - SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks.<n>Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks.<n>We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z) - LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.<n>Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.<n>However, such methods lack theoretical guarantees, making them prone to unexpected failures.<n>We aim to bridge this gap by designing a simple CL method that is theoretically sound and highly performant.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time.<n>Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks.<n>MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - SpaceNet: Make Free Space For Continual Learning [15.914199054779438]
We propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario.
SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons.
Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model.
arXiv Detail & Related papers (2020-07-15T11:21:31Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.