Wide Neural Networks Forget Less Catastrophically
- URL: http://arxiv.org/abs/2110.11526v1
- Date: Thu, 21 Oct 2021 23:49:23 GMT
- Title: Wide Neural Networks Forget Less Catastrophically
- Authors: Seyed Iman Mirzadeh, Arslan Chaudhry, Huiyi Hu, Razvan Pascanu, Dilan
Gorur, Mehrdad Farajtabar
- Abstract summary: We study the impact of "width" of the neural network architecture on catastrophic forgetting.
We study the learning dynamics of the network from various perspectives.
- Score: 39.907197907411266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A growing body of research in continual learning is devoted to overcoming the
"Catastrophic Forgetting" of neural networks by designing new algorithms that
are more robust to the distribution shifts. While the recent progress in
continual learning literature is encouraging, our understanding of what
properties of neural networks contribute to catastrophic forgetting is still
limited. To address this, instead of focusing on continual learning algorithms,
in this work, we focus on the model itself and study the impact of "width" of
the neural network architecture on catastrophic forgetting, and show that width
has a surprisingly significant effect on forgetting. To explain this effect, we
study the learning dynamics of the network from various perspectives such as
gradient norm and sparsity, orthogonalization, and lazy training regime. We
provide potential explanations that are consistent with the empirical results
across different architectures and continual learning benchmarks.
Related papers
- Collective variables of neural networks: empirical time evolution and scaling laws [0.535514140374842]
We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network.
Results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies.
arXiv Detail & Related papers (2024-10-09T21:37:14Z) - From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks [47.13391046553908]
In artificial networks, the effectiveness of these models relies on their ability to build task specific representation.
Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically.
These solutions capture the evolution of representations and the Neural Kernel across the spectrum from the rich to the lazy regimes.
arXiv Detail & Related papers (2024-09-22T23:19:04Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics.
We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning.
Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - WeightScale: Interpreting Weight Change in Neural Networks [0.0]
We present an approach to interpret learning in neural networks by measuring relative weight change on a per layer basis.
We use this approach to investigate learning in the context of vision tasks across a variety of state-of-the-art networks.
arXiv Detail & Related papers (2021-07-07T21:18:38Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Brain-Inspired Learning on Neuromorphic Substrates [5.279475826661643]
This article provides a mathematical framework for the design of practical online learning algorithms for neuromorphic substrates.
Specifically, we show a direct connection between Real-Time Recurrent Learning (RTRL) and biologically plausible learning rules for training Spiking Neural Networks (SNNs)
We motivate a sparse approximation based on block-diagonal Jacobians, which reduces the algorithm's computational complexity.
arXiv Detail & Related papers (2020-10-22T17:56:59Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.