Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
- URL: http://arxiv.org/abs/2512.01818v1
- Date: Mon, 01 Dec 2025 15:56:00 GMT
- Title: Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
- Authors: Lama Alssum, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Juan C Leon Alcazar, Bernard Ghanem,
- Abstract summary: Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task.<n>We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches.<n>We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods.
- Score: 51.07663354001582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model's tendency to overwrite previously acquired knowledge with new information. We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches. We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods, which is based exclusively on the expected label distribution, thus making it class-agnostic. As a consequence, IM regularizer can be directly integrated into various rehearsal-based continual learning methods, reducing forgetting and favoring faster convergence. Our empirical validation shows that, across datasets and regardless of the number of tasks, our proposed regularization strategy consistently improves baseline performance at the expense of a minimal computational overhead. The lightweight nature of IM ensures that it remains a practical and scalable solution, making it applicable to real-world continual learning scenarios where efficiency is paramount. Finally, we demonstrate the data-agnostic nature of our regularizer by applying it to video data, which presents additional challenges due to its temporal structure and higher memory requirements. Despite the significant domain gap, our experiments show that IM regularizer also improves the performance of video continual learning methods.
Related papers
- Unbiased Online Curvature Approximation for Regularized Graph Continual Learning [9.70311578832594]
Graph continual learning (GCL) aims to learn from a continuous sequence of graph-based tasks.<n>Regularization methods are vital for preventing catastrophic forgetting in GCL.<n>We propose a new unbiased online curvature approximation of the full Fisher information matrix (FIM) based on the model's current learning state.
arXiv Detail & Related papers (2025-09-16T06:35:13Z) - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [26.079123341965687]
We study low-rank learning and analyze how LoRA ranks and placements affect learning and forgetting.<n>A higher-rank LoRA improves task learning (plasticity) but increases forgetting, while a lower-rank LoRA enhances stability but limits adaptation.<n>Motivated by this, we propose Continual Dynamic Rank-Selective LoRA (CoDyRA), which continually updates PTMs with LoRA adapters of adaptively optimized ranks.
arXiv Detail & Related papers (2024-12-01T23:41:42Z) - Temporal-Difference Variational Continual Learning [77.92320830700797]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond [13.864609787260298]
We propose an uncertainty-aware memory-based approach to solve catastrophic forgetting.<n>We retrieve samples with specific characteristics, and - by retraining the model on such samples - we demonstrate the potential of this approach.
arXiv Detail & Related papers (2024-05-29T09:29:39Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Variance-Covariance Regularization Improves Representation Learning [28.341622247252705]
We adapt a self-supervised learning regularization technique to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg)
We demonstrate that VCReg significantly enhances transfer learning for images and videos, achieving state-of-the-art performance across numerous tasks and datasets.
In summary, VCReg offers a universally applicable regularization framework that significantly advances transfer learning and highlights the connection between gradient starvation, neural collapse, and feature transferability.
arXiv Detail & Related papers (2023-06-23T05:01:02Z) - Continual Few-shot Relation Learning via Embedding Space Regularization
and Data Augmentation [4.111899441919165]
It is necessary for the model to learn novel relational patterns with very few labeled data while avoiding catastrophic forgetting of previous task knowledge.
We propose a novel method based on embedding space regularization and data augmentation.
Our method generalizes to new few-shot tasks and avoids catastrophic forgetting of previous tasks by enforcing extra constraints on the relational embeddings and by adding extra relevant data in a self-supervised manner.
arXiv Detail & Related papers (2022-03-04T05:19:09Z) - Center Loss Regularization for Continual Learning [0.0]
In general, neural networks lack the ability to learn different tasks sequentially.
Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks.
We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.
arXiv Detail & Related papers (2021-10-21T17:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.