Theory on Forgetting and Generalization of Continual Learning
- URL: http://arxiv.org/abs/2302.05836v1
- Date: Sun, 12 Feb 2023 02:14:14 GMT
- Title: Theory on Forgetting and Generalization of Continual Learning
- Authors: Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff
- Abstract summary: Continual learning (CL) aims to learn a sequence of tasks.
There is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance.
We show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.
- Score: 41.85538120246877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL), which aims to learn a sequence of tasks, has
attracted significant recent attention. However, most work has focused on the
experimental performance of CL, and theoretical studies of CL are still
limited. In particular, there is a lack of understanding on what factors are
important and how they affect "catastrophic forgetting" and generalization
performance. To fill this gap, our theoretical analysis, under
overparameterized linear models, provides the first-known explicit form of the
expected forgetting and generalization error. Further analysis of such a key
result yields a number of theoretical explanations about how
overparameterization, task similarity, and task ordering affect both forgetting
and generalization error of CL. More interestingly, by conducting experiments
on real datasets using deep neural networks (DNNs), we show that some of these
insights even go beyond the linear models and can be carried over to practical
setups. In particular, we use concrete examples to show that our results not
only explain some interesting empirical observations in recent studies, but
also motivate better practical algorithm designs of CL.
Related papers
- Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning [37.745896674964186]
Multi-task learning (MTL) aims to improve the generalization performance of a model on multiple related tasks by training it simultaneously on those tasks.
Continual learning (CL) involves adapting to new sequentially arriving tasks over time without forgetting the previously acquired knowledge.
We develop theoretical results describing the effect of various system parameters on the model's performance in an MTL setup.
Our results reveal the impact of buffer size and model capacity on the forgetting rate in a CL setup and help shed light on some of the state-of-the-art CL methods.
arXiv Detail & Related papers (2024-08-29T23:22:40Z) - InfoNCE: Identifying the Gap Between Theory and Practice [15.744372232355]
We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in anisotropic setting.
We show that AnInfoNCE increases the recovery of previously collapsed information in CIFAR10 and ImageNet, albeit at the cost of downstream accuracy.
arXiv Detail & Related papers (2024-06-28T16:08:26Z) - Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time.
Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks.
MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z) - What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently.
We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent.
We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.