Related papers: Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities

Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities

URL: http://arxiv.org/abs/2602.20791v1
Date: Tue, 24 Feb 2026 11:29:12 GMT
Title: Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities
Authors: JinLi He, Liang Bai, Xian Yang,
Abstract summary: We formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem.<n>We derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale.<n>We validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets.
Score: 11.882528379148141
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Rehearsal is one of the key techniques for mitigating catastrophic forgetting and has been widely adopted in continual learning algorithms due to its simplicity and practicality. However, the theoretical understanding of how rehearsal scale influences learning dynamics remains limited. To address this gap, we formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics. Within this framework, we derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale. Our results uncover several intriguing and counterintuitive findings. First, rehearsal can impair model's adaptability, in sharp contrast to its traditionally recognized benefits. Second, increasing the rehearsal scale does not necessarily improve memory retention. When tasks are similar and noise levels are low, the memory error exhibits a diminishing lower bound. Finally, we validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets, revealing statistical patterns of rehearsal mechanisms in continual learning.

Related papers

The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z)
Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models.<n>We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard. The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Understanding Self-Predictive Learning for Reinforcement Learning [61.62067048348786]
We study the learning dynamics of self-predictive learning for reinforcement learning. We propose a novel self-predictive algorithm that learns two representations simultaneously.
arXiv Detail & Related papers (2022-12-06T20:43:37Z)
An Empirical Investigation of the Role of Pre-training in Lifelong Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z)
Rehearsal revealed: The limits and merits of revisiting samples in continual learning [43.40531878205344]
We provide insight into the limits and merits of rehearsal, one of continual learning's most established methods. We show that models trained sequentially with rehearsal tend to stay in the same low-loss region after a task has finished, but are at risk of overfitting on its sample memory.
arXiv Detail & Related papers (2021-04-15T13:28:14Z)
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.