Statistical Mechanical Analysis of Catastrophic Forgetting in Continual
Learning with Teacher and Student Networks
- URL: http://arxiv.org/abs/2105.07385v1
- Date: Sun, 16 May 2021 09:02:48 GMT
- Title: Statistical Mechanical Analysis of Catastrophic Forgetting in Continual
Learning with Teacher and Student Networks
- Authors: Haruka Asanuma, Shiro Takagi, Yoshihiro Nagano, Yuki Yoshida, Yasuhiko
Igarashi, and Masato Okada
- Abstract summary: When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences.
We provide the theoretical framework for analyzing catastrophic forgetting by using teacher-student learning.
We find that the network can avoid catastrophic forgetting when the similarity among input distributions is small and that of the input-output relationship of the target functions is large.
- Score: 5.209145866174911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a computational system continuously learns from an ever-changing
environment, it rapidly forgets its past experiences. This phenomenon is called
catastrophic forgetting. While a line of studies has been proposed with respect
to avoiding catastrophic forgetting, most of the methods are based on intuitive
insights into the phenomenon, and their performances have been evaluated by
numerical experiments using benchmark datasets. Therefore, in this study, we
provide the theoretical framework for analyzing catastrophic forgetting by
using teacher-student learning. Teacher-student learning is a framework in
which we introduce two neural networks: one neural network is a target function
in supervised learning, and the other is a learning neural network. To analyze
continual learning in the teacher-student framework, we introduce the
similarity of the input distribution and the input-output relationship of the
target functions as the similarity of tasks. In this theoretical framework, we
also provide a qualitative understanding of how a single-layer linear learning
neural network forgets tasks. Based on the analysis, we find that the network
can avoid catastrophic forgetting when the similarity among input distributions
is small and that of the input-output relationship of the target functions is
large. The analysis also suggests that a system often exhibits a characteristic
phenomenon called overshoot, which means that even if the learning network has
once undergone catastrophic forgetting, it is possible that the network may
perform reasonably well after further learning of the current task.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Dynamical stability and chaos in artificial neural network trajectories along training [3.379574469735166]
We study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network.
We find hints of regular and chaotic behavior depending on the learning rate regime.
This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning.
arXiv Detail & Related papers (2024-04-08T17:33:11Z) - Provable Guarantees for Neural Networks via Gradient Feature Learning [15.413985018920018]
This work proposes a unified analysis framework for two-layer networks trained by gradient descent.
The framework is centered around the principle of feature learning from prototypical gradients, and its effectiveness is demonstrated by applications in several problems.
arXiv Detail & Related papers (2023-10-19T01:45:37Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics.
We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning.
Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network.
Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation.
We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.