Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt
Representation Drift in Continual Learning
- URL: http://arxiv.org/abs/2302.11344v1
- Date: Tue, 14 Feb 2023 16:35:54 GMT
- Title: Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt
Representation Drift in Continual Learning
- Authors: Fahad Sarfraz, Elahe Arani and Bahram Zonooz
- Abstract summary: We propose ESMER, which employs a principled mechanism to modulate error sensitivity in a dual-memory rehearsal-based system.
ESMER effectively reduces forgetting and abrupt drift in representations at the task boundary by gradually adapting to the new task while consolidating knowledge.
Remarkably, it also enables the model to learn under high levels of label noise, which is ubiquitous in real-world data streams.
- Score: 13.041607703862724
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humans excel at lifelong learning, as the brain has evolved to be robust to
distribution shifts and noise in our ever-changing environment. Deep neural
networks (DNNs), however, exhibit catastrophic forgetting and the learned
representations drift drastically as they encounter a new task. This alludes to
a different error-based learning mechanism in the brain. Unlike DNNs, where
learning scales linearly with the magnitude of the error, the sensitivity to
errors in the brain decreases as a function of their magnitude. To this end, we
propose \textit{ESMER} which employs a principled mechanism to modulate error
sensitivity in a dual-memory rehearsal-based system. Concretely, it maintains a
memory of past errors and uses it to modify the learning dynamics so that the
model learns more from small consistent errors compared to large sudden errors.
We also propose \textit{Error-Sensitive Reservoir Sampling} to maintain
episodic memory, which leverages the error history to pre-select low-loss
samples as candidates for the buffer, which are better suited for retaining
information. Empirical results show that ESMER effectively reduces forgetting
and abrupt drift in representations at the task boundary by gradually adapting
to the new task while consolidating knowledge. Remarkably, it also enables the
model to learn under high levels of label noise, which is ubiquitous in
real-world data streams.
Related papers
- Neuromimetic metaplasticity for adaptive continual learning [2.1749194587826026]
We propose a metaplasticity model inspired by human working memory to achieve catastrophic forgetting-free continual learning.
A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility.
The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications.
arXiv Detail & Related papers (2024-07-09T12:21:35Z) - Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation [0.8741284539870512]
We propose and study the implementation of Rockafellian Relaxation (RR) for neural network training.
RR can enhance standard neural network methods to achieve robust performance across classification tasks.
We find that RR can mitigate the effects of dataset corruption due to both (heavy) labeling error and/or adversarial perturbation.
arXiv Detail & Related papers (2024-05-30T23:13:01Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Prime and Modulate Learning: Generation of forward models with signed
back-propagation and environmental cues [0.0]
Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems.
In this work we follow a different approach where back-propagation makes exclusive use of the sign of the error signal to prime the learning.
We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform.
arXiv Detail & Related papers (2023-09-07T16:34:30Z) - Continual Learning by Modeling Intra-Class Variation [33.30614232534283]
It has been observed that neural networks perform poorly when the data or tasks are presented sequentially.
Unlike humans, neural networks suffer greatly from catastrophic forgetting, making it impossible to perform life-long learning.
We examine memory-based continual learning and identify that large variation in the representation space is crucial for avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-10-11T12:17:43Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - Representation Memorization for Fast Learning New Knowledge without
Forgetting [36.55736909586313]
The ability to quickly learn new knowledge is a big step towards human-level intelligence.
We consider scenarios that require learning new classes or data distributions quickly and incrementally over time.
We propose "Memory-based Hebbian Adaptation" to tackle the two major challenges.
arXiv Detail & Related papers (2021-08-28T07:54:53Z) - Artificial Neural Variability for Deep Learning: On Overfitting, Noise
Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks.
ANV plays as an implicit regularizer of the mutual information between the training data and the learned model.
It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.