Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting
- URL: http://arxiv.org/abs/2410.00868v1
- Date: Tue, 1 Oct 2024 17:03:56 GMT
- Title: Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting
- Authors: Bo Liu, Mao Ye, Peter Stone, Qiang Liu,
- Abstract summary: Gradient Episodic Memory (GEM) achieves balance by utilizing a subset of past training samples to restrict the update direction of the model parameters.
We show that memory strength is effective mainly because it improves GEM's ability generalization and therefore leads to a more favorable trade-off.
- Score: 41.891312602770746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A fundamental challenge in continual learning is to balance the trade-off between learning new tasks and remembering the previously acquired knowledge. Gradient Episodic Memory (GEM) achieves this balance by utilizing a subset of past training samples to restrict the update direction of the model parameters. In this work, we start by analyzing an often overlooked hyper-parameter in GEM, the memory strength, which boosts the empirical performance by further constraining the update direction. We show that memory strength is effective mainly because it improves GEM's generalization ability and therefore leads to a more favorable trade-off. By this finding, we propose two approaches that more flexibly constrain the update direction. Our methods are able to achieve uniformly better Pareto Frontiers of remembering old and learning new knowledge than using memory strength. We further propose a computationally efficient method to approximately solve the optimization problem with more constraints.
Related papers
- Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term.
We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents.
Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Continual Learning via Manifold Expansion Replay [36.27348867557826]
Catastrophic forgetting is a major challenge to continual learning.
We propose a novel replay strategy called Replay Manifold Expansion (MaER)
We show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.
arXiv Detail & Related papers (2023-10-12T05:09:27Z) - EMO: Episodic Memory Optimization for Few-Shot Meta-Learning [69.50380510879697]
episodic memory optimization for meta-learning, we call EMO, is inspired by the human ability to recall past learning experiences from the brain's memory.
EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative.
EMO scales well with most few-shot classification benchmarks and improves the performance of optimization-based meta-learning methods.
arXiv Detail & Related papers (2023-06-08T13:39:08Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - Gradient Episodic Memory with a Soft Constraint for Continual Learning [9.52644009921388]
Catastrophic forgetting is the fatal shortcoming of a large decrease in performance on previous tasks when the model is learning a novel task.
We propose an average gradient episodic memory (A-GEM) with a soft constraint $epsilon in [0, 1]$, which is a balance factor between learning new knowledge and preserving learned knowledge.
$epsilon$-SOFT-GEM outperforms A-GEM and several continual learning benchmarks in a single training epoch.
arXiv Detail & Related papers (2020-11-16T09:06:09Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.