Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling
- URL: http://arxiv.org/abs/2507.08736v1
- Date: Fri, 11 Jul 2025 16:38:40 GMT
- Title: Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling
- Authors: Idan Mashiach, Oren Glickman, Tom Tirer,
- Abstract summary: Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks.<n>Regularization approaches aim to identify and constrain to preserve previous knowledge.
- Score: 8.875650122536797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks due to knowledge overwriting. Among the approaches to mitigate this issue, regularization techniques aim to identify and constrain "important" parameters to preserve previous knowledge. In the highly nonconvex optimization landscape of deep learning, we propose a novel perspective: tracking parameters during the final training plateau is more effective than monitoring them throughout the entire training process. We argue that parameters that exhibit higher activity (movement and variability) during this plateau reveal directions in the loss landscape that are relatively flat, making them suitable for adaptation to new tasks while preserving knowledge from previous ones. Our comprehensive experiments demonstrate that this approach achieves superior performance in balancing catastrophic forgetting mitigation with strong performance on newly learned tasks.
Related papers
- Learning without Isolation: Pathway Protection for Continual Learning [64.3476595369537]
Deep networks are prone to catastrophic forgetting during sequential task learning.<n>We propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching.<n>Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting.
arXiv Detail & Related papers (2025-05-24T07:16:55Z) - Efficient Rehearsal Free Zero Forgetting Continual Learning using
Adaptive Weight Modulation [3.6683171094134805]
Continual learning involves acquiring knowledge of multiple tasks over an extended period.
Most approaches to this problem seek a balance between maximizing performance on the new tasks and minimizing the forgetting of previous tasks.
Our approach attempts to maximize the performance of the new task, while ensuring zero forgetting.
arXiv Detail & Related papers (2023-11-26T12:36:05Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - An Empirical Investigation of the Role of Pre-training in Lifelong
Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially.
We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z) - Flattening Sharpness for Dynamic Gradient Projection Memory Benefits
Continual Learning [67.99349091593324]
We investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario.
Our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.
arXiv Detail & Related papers (2021-10-09T15:13:44Z) - DIODE: Dilatable Incremental Object Detection [15.59425584971872]
Conventional deep learning models lack the capability of preserving previously learned knowledge.
We propose a dilatable incremental object detector (DIODE) for multi-step incremental detection tasks.
Our method achieves up to 6.4% performance improvement by increasing the number of parameters by just 1.2% for each newly learned task.
arXiv Detail & Related papers (2021-08-12T09:45:57Z) - Natural continual learning: success is a journey, not (just) a
destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent.
Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs.
The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z) - Continual Learning via Bit-Level Information Preserving [88.32450740325005]
We study the continual learning process through the lens of information theory.
We propose Bit-Level Information Preserving (BLIP) that preserves the information gain on model parameters.
BLIP achieves close to zero forgetting while only requiring constant memory overheads throughout continual learning.
arXiv Detail & Related papers (2021-05-10T15:09:01Z) - Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially.
We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.