Theory on Forgetting and Generalization of Continual Learning
- URL: http://arxiv.org/abs/2302.05836v1
- Date: Sun, 12 Feb 2023 02:14:14 GMT
- Title: Theory on Forgetting and Generalization of Continual Learning
- Authors: Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff
- Abstract summary: Continual learning (CL) aims to learn a sequence of tasks.
There is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance.
We show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.
- Score: 41.85538120246877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL), which aims to learn a sequence of tasks, has
attracted significant recent attention. However, most work has focused on the
experimental performance of CL, and theoretical studies of CL are still
limited. In particular, there is a lack of understanding on what factors are
important and how they affect "catastrophic forgetting" and generalization
performance. To fill this gap, our theoretical analysis, under
overparameterized linear models, provides the first-known explicit form of the
expected forgetting and generalization error. Further analysis of such a key
result yields a number of theoretical explanations about how
overparameterization, task similarity, and task ordering affect both forgetting
and generalization error of CL. More interestingly, by conducting experiments
on real datasets using deep neural networks (DNNs), we show that some of these
insights even go beyond the linear models and can be carried over to practical
setups. In particular, we use concrete examples to show that our results not
only explain some interesting empirical observations in recent studies, but
also motivate better practical algorithm designs of CL.
Related papers
- InfoNCE: Identifying the Gap Between Theory and Practice [15.744372232355]
We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in anisotropic setting.
We show that AnInfoNCE increases the recovery of previously collapsed information in CIFAR10 and ImageNet, albeit at the cost of downstream accuracy.
arXiv Detail & Related papers (2024-06-28T16:08:26Z) - Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time.
Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks.
MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z) - Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently.
We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent.
We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z) - Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.