Related papers: Overcoming catastrophic forgetting in neural networks

Overcoming catastrophic forgetting in neural networks

URL: http://arxiv.org/abs/2507.10485v1
Date: Mon, 14 Jul 2025 17:04:05 GMT
Title: Overcoming catastrophic forgetting in neural networks
Authors: Brandon Shuen Yi Loke, Filippo Quadri, Gabriel Vivanco, Maximilian Casagrande, Saúl Fenollosa,
Abstract summary: Catastrophic forgetting is the primary challenge that hinders continual learning.<n> Elastic Weight Consolidation is a regularization-based approach inspired by synaptic consolidation in biological neural systems.<n>Our results confirm what was shown in previous research, showing that EWC significantly reduces forgetting compared to naive training.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Catastrophic forgetting is the primary challenge that hinders continual learning, which refers to a neural network ability to sequentially learn multiple tasks while retaining previously acquired knowledge. Elastic Weight Consolidation, a regularization-based approach inspired by synaptic consolidation in biological neural systems, has been used to overcome this problem. In this study prior research is replicated and extended by evaluating EWC in supervised learning settings using the PermutedMNIST and RotatedMNIST benchmarks. Through systematic comparisons with L2 regularization and stochastic gradient descent (SGD) without regularization, we analyze how different approaches balance knowledge retention and adaptability. Our results confirm what was shown in previous research, showing that EWC significantly reduces forgetting compared to naive training while slightly compromising learning efficiency on new tasks. Moreover, we investigate the impact of dropout regularization and varying hyperparameters, offering insights into the generalization of EWC across diverse learning scenarios. These results underscore EWC's potential as a viable solution for lifelong learning in neural networks.

Related papers

Randomized Forward Mode Gradient for Spiking Neural Networks in Scientific Machine Learning [4.178826560825283]
Spiking neural networks (SNNs) represent a promising approach in machine learning, combining the hierarchical learning capabilities of deep neural networks with the energy efficiency of spike-based computations. Traditional end-to-end training of SNNs is often based on back-propagation, where weight updates are derived from gradients computed through the chain rule. This method encounters challenges due to its limited biological plausibility and inefficiencies on neuromorphic hardware. In this study, we introduce an alternative training approach for SNNs. Instead of using back-propagation, we leverage weight perturbation methods within a forward-mode
arXiv Detail & Related papers (2024-11-11T15:20:54Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
Learning Continually by Spectral Regularization [45.55508032009977]
Continual learning algorithms seek to mitigate loss of plasticity by sustaining good performance while maintaining network trainability. We develop a new technique for improving continual learning inspired by the observation that the singular values of the neural network parameters at initialization are an important factor for trainability during early phases of learning. We present an experimental analysis that shows how the proposed spectral regularizer can sustain trainability and performance across a range of model architectures in continual supervised and reinforcement learning settings.
arXiv Detail & Related papers (2024-06-10T21:34:43Z)
SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks. We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations [36.24028295650668]
Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Deep neural networks are prone to forgetting their previously learned information upon learning new ones. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years.
arXiv Detail & Related papers (2022-03-12T21:12:41Z)
Reducing Catastrophic Forgetting in Self Organizing Maps with Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples. This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z)
Improving Music Performance Assessment with Contrastive Learning [78.8942067357231]
This study investigates contrastive learning as a potential method to improve existing MPA systems. We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network. Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks.
arXiv Detail & Related papers (2021-08-03T19:24:25Z)
SpikePropamine: Differentiable Plasticity in Spiking Neural Networks [0.0]
We introduce a framework for learning the dynamics of synaptic plasticity and neuromodulated synaptic plasticity in Spiking Neural Networks (SNNs) We show that SNNs augmented with differentiable plasticity are sufficient for solving a set of challenging temporal learning tasks. These networks are also shown to be capable of producing locomotion on a high-dimensional robotic learning task.
arXiv Detail & Related papers (2021-06-04T19:29:07Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.