Architecture Matters in Continual Learning
- URL: http://arxiv.org/abs/2202.00275v1
- Date: Tue, 1 Feb 2022 08:32:22 GMT
- Title: Architecture Matters in Continual Learning
- Authors: Seyed Iman Mirzadeh, Arslan Chaudhry, Dong Yin, Timothy Nguyen, Razvan
Pascanu, Dilan Gorur, Mehrdad Farajtabar
- Abstract summary: We show that the choice of architecture can significantly impact the continual learning performance.
Our findings entail best practices and recommendations that can improve the continual learning performance.
- Score: 43.36462900350999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large body of research in continual learning is devoted to overcoming the
catastrophic forgetting of neural networks by designing new algorithms that are
robust to the distribution shifts. However, the majority of these works are
strictly focused on the "algorithmic" part of continual learning for a "fixed
neural network architecture", and the implications of using different
architectures are mostly neglected. Even the few existing continual learning
methods that modify the model assume a fixed architecture and aim to develop an
algorithm that efficiently uses the model throughout the learning experience.
However, in this work, we show that the choice of architecture can
significantly impact the continual learning performance, and different
architectures lead to different trade-offs between the ability to remember
previous tasks and learning new ones. Moreover, we study the impact of various
architectural decisions, and our findings entail best practices and
recommendations that can improve the continual learning performance.
Related papers
- The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics.
We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning.
Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z) - Learning Interpretable Models Through Multi-Objective Neural
Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability.
We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z) - Wide Neural Networks Forget Less Catastrophically [39.907197907411266]
We study the impact of "width" of the neural network architecture on catastrophic forgetting.
We study the learning dynamics of the network from various perspectives.
arXiv Detail & Related papers (2021-10-21T23:49:23Z) - Learn to Bind and Grow Neural Structures [0.3553493344868413]
We present a new framework, Learn to Bind and Grow, which learns a neural architecture for a new task incrementally.
Central to our approach is a novel, interpretable, parameterization of the shared, multi-task architecture space.
Experiments on continual learning benchmarks show that our framework performs comparably with earlier expansion based approaches.
arXiv Detail & Related papers (2020-11-21T09:40:26Z) - D2RL: Deep Dense Architectures in Reinforcement Learning [47.67475810050311]
We take inspiration from successful architectural choices in computer vision and generative modelling.
We investigate the use of deeper networks and dense connections for reinforcement learning on a variety of simulated robotic learning benchmark environments.
arXiv Detail & Related papers (2020-10-19T01:27:07Z) - NAS-DIP: Learning Deep Image Prior with Neural Architecture Search [65.79109790446257]
Recent work has shown that the structure of deep convolutional neural networks can be used as a structured image prior.
We propose to search for neural architectures that capture stronger image priors.
We search for an improved network by leveraging an existing neural architecture search algorithm.
arXiv Detail & Related papers (2020-08-26T17:59:36Z) - Understanding Deep Architectures with Reasoning Layer [60.90906477693774]
We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
arXiv Detail & Related papers (2020-06-24T00:26:35Z) - Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs.
Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances.
In this paper, we tackle this varying depth problem using a steerable architecture.
We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z) - Learning to Rank Learning Curves [15.976034696758148]
We present a new method that saves computational budget by terminating poor configurations early on in the training.
We show that our model is able to effectively rank learning curves without having to observe many or very long learning curves.
arXiv Detail & Related papers (2020-06-05T10:49:52Z) - Disturbance-immune Weight Sharing for Neural Architecture Search [96.93812980299428]
We propose a disturbance-immune update strategy for model updating.
We theoretically analyze the effectiveness of our strategy in alleviating the performance disturbance risk.
arXiv Detail & Related papers (2020-03-29T17:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.