Overcoming the Stability Gap in Continual Learning
- URL: http://arxiv.org/abs/2306.01904v4
- Date: Mon, 16 Sep 2024 19:32:48 GMT
- Title: Overcoming the Stability Gap in Continual Learning
- Authors: Md Yousuf Harun, Christopher Kanan,
- Abstract summary: Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users.
A major problem is model decay, where the DNN's predictions become more erroneous over time, resulting in revenue loss or unhappy users.
Here, we study how continual learning (CL) could potentially overcome model decay in large pre-trained DNNs.
- Score: 15.8696301825572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay, where the DNN's predictions become more erroneous over time, resulting in revenue loss or unhappy users. To mitigate model decay, DNNs are retrained from scratch using old and new data. This is computationally expensive, so retraining happens only once performance significantly decreases. Here, we study how continual learning (CL) could potentially overcome model decay in large pre-trained DNNs and greatly reduce computational costs for keeping DNNs up-to-date. We identify the "stability gap" as a major obstacle in our setting. The stability gap refers to a phenomenon where learning new data causes large drops in performance for past tasks before CL mitigation methods eventually compensate for this drop. We test two hypotheses to investigate the factors influencing the stability gap and identify a method that vastly reduces this gap. In large-scale experiments for both easy and hard CL distributions (e.g., class incremental learning), we demonstrate that our method reduces the stability gap and greatly increases computational efficiency. Our work aligns CL with the goals of the production setting, where CL is needed for many applications.
Related papers
- Exploring the Stability Gap in Continual Learning: The Role of the Classification Head [0.6749750044497732]
The stability gap is a phenomenon where models initially lose performance on previously learned tasks before partially recovering during training.
We introduce the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap.
Our experiments demonstrate that NMC not only improves final performance, but also significantly enhances training stability across various continual learning benchmarks.
arXiv Detail & Related papers (2024-11-06T15:45:01Z) - MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL [20.22674077197914]
Recent work has explored updating neural networks with large numbers of gradient steps for every new sample.
High update-to-data ratios introduce instability to the training process.
Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD), uses small amounts of generated data to stabilize high UTD training.
arXiv Detail & Related papers (2024-10-11T15:13:17Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Forget but Recall: Incremental Latent Rectification in Continual Learning [21.600690867361617]
Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs)
Existing Continual Learning approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks.
This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR.
arXiv Detail & Related papers (2024-06-25T08:57:47Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Challenging Common Assumptions about Catastrophic Forgetting [13.1202659074346]
We study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence.
We propose a new framework, SCoLe, to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD.
arXiv Detail & Related papers (2022-07-10T21:40:54Z) - Balanced Softmax Cross-Entropy for Incremental Learning [6.5423218639215275]
Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks.
Recent methods has proven to be effective to mitigate catastrophic forgetting.
We propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances.
arXiv Detail & Related papers (2021-03-23T13:30:26Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.