Neural Network Retraining for Model Serving
- URL: http://arxiv.org/abs/2004.14203v1
- Date: Wed, 29 Apr 2020 13:52:28 GMT
- Title: Neural Network Retraining for Model Serving
- Authors: Diego Klabjan, Xiaofeng Zhu
- Abstract summary: We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference.
We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining.
- Score: 32.857847595096025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose incremental (re)training of a neural network model to cope with a
continuous flow of new data in inference during model serving. As such, this is
a life-long learning process. We address two challenges of life-long
retraining: catastrophic forgetting and efficient retraining. If we combine all
past and new data it can easily become intractable to retrain the neural
network model. On the other hand, if the model is retrained using only new
data, it can easily suffer catastrophic forgetting and thus it is paramount to
strike the right balance. Moreover, if we retrain all weights of the model
every time new data is collected, retraining tends to require too many
computing resources. To solve these two issues, we propose a novel retraining
model that can select important samples and important weights utilizing
multi-armed bandits. To further address forgetting, we propose a new
regularization term focusing on synapse and neuron importance. We analyze
multiple datasets to document the outcome of the proposed retraining methods.
Various experiments demonstrate that our retraining methodologies mitigate the
catastrophic forgetting problem while boosting model performance.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem [2.598133279943607]
We present a framework where each new ability (a skill) is represented as a basis function.
We find analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute.
Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
arXiv Detail & Related papers (2024-04-26T17:45:32Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural
Networks [12.525959293825318]
We introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for deep neural networks (DNNs)
LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model, and the relearning phase, which emphasizes learning on generalizable features.
We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings.
arXiv Detail & Related papers (2023-03-18T16:45:54Z) - Online Evolutionary Neural Architecture Search for Multivariate
Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm.
ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks.
Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z) - Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced
Training for Neural Machine Translation [15.309573393914462]
Neural networks tend to forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions.
This problem is called textitcatastrophic forgetting, which is a fundamental challenge in the continual learning of neural networks.
We propose Complementary Online Knowledge Distillation (COKD), which uses dynamically updated teacher models trained on specific data orders to iteratively provide complementary knowledge to the student model.
arXiv Detail & Related papers (2022-03-08T08:08:45Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased.
We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief.
In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z) - Frosting Weights for Better Continual Training [22.554993259239307]
Training a neural network model can be a lifelong learning process and is a computationally intensive one.
Deep neural network models can suffer from catastrophic forgetting during retraining on new data.
We propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the problem.
arXiv Detail & Related papers (2020-01-07T00:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.