Related papers: Information-Theoretic Generalization Bounds of Replay-based Continual Learning

Information-Theoretic Generalization Bounds of Replay-based Continual Learning

URL: http://arxiv.org/abs/2507.12043v1
Date: Wed, 16 Jul 2025 09:00:57 GMT
Title: Information-Theoretic Generalization Bounds of Replay-based Continual Learning
Authors: Wen Wen, Tieliang Gong, Yunjiao Zhang, Zeyu Gao, Weizhan Zhang, Yong-Jin Liu,
Abstract summary: Continual learning (CL) has emerged as a dominant paradigm for acquiring knowledge from sequential tasks.<n>We establish a unified theoretical framework for replay-based CL, deriving a series of information-theoretic bounds.<n>We show that utilizing the limited exemplars of previous tasks alongside the current task data, rather than exhaustive replay, facilitates improved generalization.
Score: 28.460141051954988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual learning (CL) has emerged as a dominant paradigm for acquiring knowledge from sequential tasks while avoiding catastrophic forgetting. Although many CL methods have been proposed to show impressive empirical performance, the theoretical understanding of their generalization behavior remains limited, particularly for replay-based approaches. In this paper, we establish a unified theoretical framework for replay-based CL, deriving a series of information-theoretic bounds that explicitly characterize how the memory buffer interacts with the current task to affect generalization. Specifically, our hypothesis-based bounds reveal that utilizing the limited exemplars of previous tasks alongside the current task data, rather than exhaustive replay, facilitates improved generalization while effectively mitigating catastrophic forgetting. Furthermore, our prediction-based bounds yield tighter and computationally tractable upper bounds of the generalization gap through the use of low-dimensional variables. Our analysis is general and broadly applicable to a wide range of learning algorithms, exemplified by stochastic gradient Langevin dynamics (SGLD) as a representative method. Comprehensive experimental evaluations demonstrate the effectiveness of our derived bounds in capturing the generalization dynamics in replay-based CL settings.

Related papers

Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings [8.732260277121547]
We provide a generalization analysis for the Contrastive Representation Learning framework under non-$i.d.$ settings.<n>We derive bounds which indicate the required number of samples in each class scales as the logarithm of the covering number of the class of learnable representations associated to that class.<n>Next, we apply our main results to derive excess risk bounds for common function classes such as linear maps and neural networks.
arXiv Detail & Related papers (2025-05-08T04:26:41Z)
Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models.<n>We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning [17.236861687708096]
Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge. Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes.
arXiv Detail & Related papers (2024-04-28T12:25:09Z)
A Unified Approach to Controlling Implicit Regularization via Mirror Descent [18.536453909759544]
Mirror descent (MD) is a notable generalization of gradient descent (GD) We show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions.
arXiv Detail & Related papers (2023-06-24T03:57:26Z)
Theory on Forgetting and Generalization of Continual Learning [41.85538120246877]
Continual learning (CL) aims to learn a sequence of tasks. There is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance. We show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.
arXiv Detail & Related papers (2023-02-12T02:14:14Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Provable Generalization of Overparameterized Meta-learning Trained with SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML) We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z)
In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk. When evaluated empirically, most of these bounds are numerically vacuous. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z)
Extreme Memorization via Scale of Initialization [72.78162454173803]
We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD. We find that the extent and manner in which generalization ability is affected depends on the activation and loss function used. In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.
arXiv Detail & Related papers (2020-08-31T04:53:11Z)
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.