Related papers: Theory on Forgetting and Generalization of Continual Learning

Theory on Forgetting and Generalization of Continual Learning

URL: http://arxiv.org/abs/2302.05836v1
Date: Sun, 12 Feb 2023 02:14:14 GMT
Title: Theory on Forgetting and Generalization of Continual Learning
Authors: Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff
Abstract summary: Continual learning (CL) aims to learn a sequence of tasks. There is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance. We show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.
Score: 41.85538120246877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual learning (CL), which aims to learn a sequence of tasks, has attracted significant recent attention. However, most work has focused on the experimental performance of CL, and theoretical studies of CL are still limited. In particular, there is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance. To fill this gap, our theoretical analysis, under overparameterized linear models, provides the first-known explicit form of the expected forgetting and generalization error. Further analysis of such a key result yields a number of theoretical explanations about how overparameterization, task similarity, and task ordering affect both forgetting and generalization error of CL. More interestingly, by conducting experiments on real datasets using deep neural networks (DNNs), we show that some of these insights even go beyond the linear models and can be carried over to practical setups. In particular, we use concrete examples to show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.

Related papers

Information-Theoretic Generalization Bounds of Replay-based Continual Learning [28.460141051954988]
Continual learning (CL) has emerged as a dominant paradigm for acquiring knowledge from sequential tasks.<n>We establish a unified theoretical framework for replay-based CL, deriving a series of information-theoretic bounds.<n>We show that utilizing the limited exemplars of previous tasks alongside the current task data, rather than exhaustive replay, facilitates improved generalization.
arXiv Detail & Related papers (2025-07-16T09:00:57Z)
Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models. We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z)
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning [37.745896674964186]
Multi-task learning (MTL) aims to improve the generalization performance of a model on multiple related tasks by training it simultaneously on those tasks. Continual learning (CL) involves adapting to new sequentially arriving tasks over time without forgetting the previously acquired knowledge. We develop theoretical results describing the effect of various system parameters on the model's performance in an MTL setup. Our results reveal the impact of buffer size and model capacity on the forgetting rate in a CL setup and help shed light on some of the state-of-the-art CL methods.
arXiv Detail & Related papers (2024-08-29T23:22:40Z)
InfoNCE: Identifying the Gap Between Theory and Practice [15.744372232355]
We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in anisotropic setting. We show that AnInfoNCE increases the recovery of previously collapsed information in CIFAR10 and ImageNet, albeit at the cost of downstream accuracy.
arXiv Detail & Related papers (2024-06-28T16:08:26Z)
Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks. MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z)
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets. We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning. The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z)
Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent. We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z)
Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class. We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z)
Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.