Understanding the Role of Training Regimes in Continual Learning
- URL: http://arxiv.org/abs/2006.06958v1
- Date: Fri, 12 Jun 2020 06:00:27 GMT
- Title: Understanding the Role of Training Regimes in Continual Learning
- Authors: Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, Hassan
Ghasemzadeh
- Abstract summary: Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially.
We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
- Score: 51.32945003239048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catastrophic forgetting affects the training of neural networks, limiting
their ability to learn multiple tasks sequentially. From the perspective of the
well established plasticity-stability dilemma, neural networks tend to be
overly plastic, lacking the stability necessary to prevent the forgetting of
previous knowledge, which means that as learning progresses, networks tend to
forget previously seen tasks. This phenomenon coined in the continual learning
literature, has attracted much attention lately, and several families of
approaches have been proposed with different degrees of success. However, there
has been limited prior work extensively analyzing the impact that different
training regimes -- learning rate, batch size, regularization method-- can have
on forgetting. In this work, we depart from the typical approach of altering
the learning algorithm to improve stability. Instead, we hypothesize that the
geometrical properties of the local minima found for each task play an
important role in the overall degree of forgetting. In particular, we study the
effect of dropout, learning rate decay, and batch size, on forming training
regimes that widen the tasks' local minima and consequently, on helping it not
to forget catastrophically. Our study provides practical insights to improve
stability via simple yet effective techniques that outperform alternative
baselines.
Related papers
- Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms.
We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z) - Keep Moving: identifying task-relevant subspaces to maximise plasticity for newly learned tasks [0.22499166814992438]
Continual learning algorithms strive to acquire new knowledge while preserving prior information.
Often, these algorithms emphasise stability and restrict network updates upon learning new tasks.
But is all change detrimental?
We propose that activation spaces in neural networks can be decomposed into two subspaces.
arXiv Detail & Related papers (2023-10-07T08:54:43Z) - Continual Learning by Modeling Intra-Class Variation [33.30614232534283]
It has been observed that neural networks perform poorly when the data or tasks are presented sequentially.
Unlike humans, neural networks suffer greatly from catastrophic forgetting, making it impossible to perform life-long learning.
We examine memory-based continual learning and identify that large variation in the representation space is crucial for avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-10-11T12:17:43Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - An Empirical Investigation of the Role of Pre-training in Lifelong
Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially.
We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z) - Continual Learning with Neuron Activation Importance [1.7513645771137178]
Continual learning is a concept of online learning with multiple sequential tasks.
One of the critical barriers of continual learning is that a network should learn a new task keeping the knowledge of old tasks without access to any data of the old tasks.
We propose a neuron activation importance-based regularization method for stable continual learning regardless of the order of tasks.
arXiv Detail & Related papers (2021-07-27T08:09:32Z) - Few-shot Continual Learning: a Brain-inspired Approach [34.306678703379944]
We provide a first systematic study on few-shot continual learning (FSCL) and present an effective solution with deep neural networks.
Our solution is based on the observation that continual learning of a task sequence inevitably interferes few-shot generalization.
We draw inspirations from the robust brain system and develop a method that (1) interdependently updates a pair of fast / slow weights for continual learning and few-shot learning to disentangle their divergent objectives, inspired by the biological model of meta-plasticity and fast / slow synapse; and (2) applies a brain-inspired two-step consolidation strategy to learn a task sequence without forgetting in the
arXiv Detail & Related papers (2021-04-19T03:40:48Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.