Flattening Sharpness for Dynamic Gradient Projection Memory Benefits
Continual Learning
- URL: http://arxiv.org/abs/2110.04593v1
- Date: Sat, 9 Oct 2021 15:13:44 GMT
- Title: Flattening Sharpness for Dynamic Gradient Projection Memory Benefits
Continual Learning
- Authors: Danruo Deng, Guangyong Chen, Jianye Hao, Qiong Wang, Pheng-Ann Heng
- Abstract summary: We investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario.
Our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.
- Score: 67.99349091593324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The backpropagation networks are notably susceptible to catastrophic
forgetting, where networks tend to forget previously learned skills upon
learning new ones. To address such the 'sensitivity-stability' dilemma, most
previous efforts have been contributed to minimizing the empirical risk with
different parameter regularization terms and episodic memory, but rarely
exploring the usages of the weight loss landscape. In this paper, we
investigate the relationship between the weight loss landscape and
sensitivity-stability in the continual learning scenario, based on which, we
propose a novel method, Flattening Sharpness for Dynamic Gradient Projection
Memory (FS-DGPM). In particular, we introduce a soft weight to represent the
importance of each basis representing past tasks in GPM, which can be
adaptively learned during the learning process, so that less important bases
can be dynamically released to improve the sensitivity of new skill learning.
We further introduce Flattening Sharpness (FS) to reduce the generalization gap
by explicitly regulating the flatness of the weight loss landscape of all seen
tasks. As demonstrated empirically, our proposed method consistently
outperforms baselines with the superior ability to learn new skills while
alleviating forgetting effectively.
Related papers
- Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions.
We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - Class-Incremental Learning by Knowledge Distillation with Adaptive
Feature Consolidation [39.97128550414934]
We present a novel class incremental learning approach based on deep neural networks.
It continually learns new tasks with limited memory for storing examples in the previous tasks.
Our algorithm is based on knowledge distillation and provides a principled way to maintain the representations of old models.
arXiv Detail & Related papers (2022-04-02T16:30:04Z) - Understanding Catastrophic Forgetting and Remembering in Continual
Learning with Optimal Relevance Mapping [10.970706194360451]
Catastrophic forgetting in neural networks is a significant problem for continual learning.
We introduce Relevance Mapping Networks (RMNs) which are inspired by the Optimal Overlap Hypothesis.
We show that RMNs learn an optimized representational overlap that overcomes the twin problem of catastrophic forgetting and remembering.
arXiv Detail & Related papers (2021-02-22T20:34:00Z) - Enabling Continual Learning with Differentiable Hebbian Plasticity [18.12749708143404]
Continual learning is the problem of sequentially learning new tasks or knowledge while protecting previously acquired knowledge.
catastrophic forgetting poses a grand challenge for neural networks performing such learning process.
We propose a Differentiable Hebbian Consolidation model which is composed of a Differentiable Hebbian Plasticity.
arXiv Detail & Related papers (2020-06-30T06:42:19Z) - Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially.
We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.