Exploring the Stability Gap in Continual Learning: The Role of the Classification Head
- URL: http://arxiv.org/abs/2411.04723v2
- Date: Mon, 25 Nov 2024 10:09:05 GMT
- Title: Exploring the Stability Gap in Continual Learning: The Role of the Classification Head
- Authors: Wojciech Łapacz, Daniel Marczak, Filip Szatkowski, Tomasz Trzciński,
- Abstract summary: The stability gap is a phenomenon where models initially lose performance on previously learned tasks before partially recovering during training.
We introduce the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap.
Our experiments demonstrate that NMC not only improves final performance, but also significantly enhances training stability across various continual learning benchmarks.
- Score: 0.6749750044497732
- License:
- Abstract: Continual learning (CL) has emerged as a critical area in machine learning, enabling neural networks to learn from evolving data distributions while mitigating catastrophic forgetting. However, recent research has identified the stability gap -- a phenomenon where models initially lose performance on previously learned tasks before partially recovering during training. Such learning dynamics are contradictory to the intuitive understanding of stability in continual learning where one would expect the performance to degrade gradually instead of rapidly decreasing and then partially recovering later. To better understand and alleviate the stability gap, we investigate it at different levels of the neural network architecture, particularly focusing on the role of the classification head. We introduce the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap. Our experiments demonstrate that NMC not only improves final performance, but also significantly enhances training stability across various continual learning benchmarks, including CIFAR100, ImageNet100, CUB-200, and FGVC Aircrafts. Moreover, we find that NMC also reduces task-recency bias. Our analysis provides new insights into the stability gap and suggests that the primary contributor to this phenomenon is the linear head, rather than the insufficient representation learning.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - Auxiliary Classifiers Improve Stability and Efficiency in Continual Learning [13.309853617922824]
We investigate the stability of intermediate neural network layers during continual learning.
We show auxiliary classifiers (ACs) can leverage this stability to improve performance.
Our findings suggest that ACs offer a promising avenue for enhancing continual learning models.
arXiv Detail & Related papers (2024-03-12T08:33:26Z) - Investigating the Edge of Stability Phenomenon in Reinforcement Learning [20.631461205889487]
We explore the edge of stability phenomenon in reinforcement learning (RL)
Despite significant differences to supervised learning, the edge of stability phenomenon can be present in off-policy deep RL.
Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.
arXiv Detail & Related papers (2023-07-09T15:46:27Z) - On the Stability-Plasticity Dilemma of Class-Incremental Learning [50.863180812727244]
A primary goal of class-incremental learning is to strike a balance between stability and plasticity.
This paper aims to shed light on how effectively recent class-incremental learning algorithms address the stability-plasticity trade-off.
arXiv Detail & Related papers (2023-04-04T09:34:14Z) - Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks
in Continual Learning [23.15206507040553]
We propose Auxiliary Network Continual Learning (ANCL) to equip the neural network with the ability to learn the current task.
ANCL applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability.
More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability.
arXiv Detail & Related papers (2023-03-16T17:00:42Z) - New Insights on Relieving Task-Recency Bias for Online Class Incremental
Learning [37.888061221999294]
In all settings, the online class incremental learning (OCIL) is more challenging and can be encountered more frequently in real world.
To strike a preferable trade-off between stability and plasticity, we propose an Adaptive Focus Shifting algorithm.
arXiv Detail & Related papers (2023-02-16T11:52:00Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - Continual evaluation for lifelong learning: Identifying the stability
gap [35.99653845083381]
We show that a set of common state-of-the-art methods still suffers from substantial forgetting upon starting to learn new tasks.
We refer to this intriguing but potentially problematic phenomenon as the stability gap.
We establish a framework for continual evaluation that uses per-iteration evaluation and we define a new set of metrics to quantify worst-case performance.
arXiv Detail & Related papers (2022-05-26T15:56:08Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially.
We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.