Catastrophic Forgetting in the Context of Model Updates
- URL: http://arxiv.org/abs/2306.10181v1
- Date: Fri, 16 Jun 2023 21:21:41 GMT
- Title: Catastrophic Forgetting in the Context of Model Updates
- Authors: Rich Harang, Hillary Sanders
- Abstract summary: Deep neural networks can cost many thousands of dollars to train.
When new data comes in the pipeline, you can train a new model from scratch on all existing data.
The former is costly and slow. The latter is cheap and fast, but catastrophic forgetting generally causes the new model to 'forget' how to classify older data well.
- Score: 0.360953887026184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A large obstacle to deploying deep learning models in practice is the process
of updating models post-deployment (ideally, frequently). Deep neural networks
can cost many thousands of dollars to train. When new data comes in the
pipeline, you can train a new model from scratch (randomly initialized weights)
on all existing data. Instead, you can take an existing model and fine-tune
(continue to train) it on new data. The former is costly and slow. The latter
is cheap and fast, but catastrophic forgetting generally causes the new model
to 'forget' how to classify older data well. There are a plethora of
complicated techniques to keep models from forgetting their past learnings.
Arguably the most basic is to mix in a small amount of past data into the new
data during fine-tuning: also known as 'data rehearsal'. In this paper, we
compare various methods of limiting catastrophic forgetting and conclude that
if you can maintain access to a portion of your past data (or tasks), data
rehearsal is ideal in terms of overall accuracy across all time periods, and
performs even better when combined with methods like Elastic Weight
Consolidation (EWC). Especially when the amount of past data (past 'tasks') is
large compared to new data, the cost of updating an existing model is far
cheaper and faster than training a new model from scratch.
Related papers
- LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - Continual Pre-Training of Large Language Models: How to (re)warm your
model? [21.8468835868142]
Large language models (LLMs) are routinely pre-trained on tokens, only to restart the process over again once new data becomes available.
We study the warmup phase of models pretrained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens)
Our results show that while re-warming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$ billionsx2013$even for a large downstream dataset.
arXiv Detail & Related papers (2023-08-08T03:18:18Z) - OLR-WA Online Regression with Weighted Average [0.0]
We introduce a new online linear regression approach to train machine learning models.
The introduced model, named OLR-WA, uses user-defined weights to provide flexibility in the face of changing data.
For consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.
arXiv Detail & Related papers (2023-07-06T06:39:27Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
We compare the two popular updating methods, fine-tuning and linear probing.
We find that fine-tuning is better than linear probing as the number of samples increases.
arXiv Detail & Related papers (2022-05-13T08:47:06Z) - Datamodels: Predicting Predictions from Training Data [86.66720175866415]
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data.
We show that even simple linear datamodels can successfully predict model outputs.
arXiv Detail & Related papers (2022-02-01T18:15:24Z) - Forward Compatible Training for Representation Learning [53.300192863727226]
backward compatible training (BCT) modifies training of the new model to make its representations compatible with those of the old model.
BCT can significantly hinder the performance of the new model.
In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT)
arXiv Detail & Related papers (2021-12-06T06:18:54Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased.
We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief.
In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z) - Update Frequently, Update Fast: Retraining Semantic Parsing Systems in a
Fraction of Time [11.035461657669096]
We show that it is possible to match the performance of a model trained from scratch in less than 10% of a time via fine-tuning.
We demonstrate the effectiveness of our method on multiple splits of the Facebook TOP and SNIPS datasets.
arXiv Detail & Related papers (2020-10-15T16:37:41Z) - Neural Network Retraining for Model Serving [32.857847595096025]
We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference.
We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining.
arXiv Detail & Related papers (2020-04-29T13:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.