Frosting Weights for Better Continual Training
- URL: http://arxiv.org/abs/2001.01829v1
- Date: Tue, 7 Jan 2020 00:53:46 GMT
- Title: Frosting Weights for Better Continual Training
- Authors: Xiaofeng Zhu, Feng Liu, Goce Trajcevski, Dingding Wang
- Abstract summary: Training a neural network model can be a lifelong learning process and is a computationally intensive one.
Deep neural network models can suffer from catastrophic forgetting during retraining on new data.
We propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the problem.
- Score: 22.554993259239307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a neural network model can be a lifelong learning process and is a
computationally intensive one. A severe adverse effect that may occur in deep
neural network models is that they can suffer from catastrophic forgetting
during retraining on new data. To avoid such disruptions in the continuous
learning, one appealing property is the additive nature of ensemble models. In
this paper, we propose two generic ensemble approaches, gradient boosting and
meta-learning, to solve the catastrophic forgetting problem in tuning
pre-trained neural network models.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Neuromimetic metaplasticity for adaptive continual learning [2.1749194587826026]
We propose a metaplasticity model inspired by human working memory to achieve catastrophic forgetting-free continual learning.
A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility.
The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications.
arXiv Detail & Related papers (2024-07-09T12:21:35Z) - Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for
Adaptive Learning [0.0]
A new type of neural network is presented using the rapid neural network paradigm.
It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy.
arXiv Detail & Related papers (2023-09-12T22:34:34Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced
Training for Neural Machine Translation [15.309573393914462]
Neural networks tend to forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions.
This problem is called textitcatastrophic forgetting, which is a fundamental challenge in the continual learning of neural networks.
We propose Complementary Online Knowledge Distillation (COKD), which uses dynamically updated teacher models trained on specific data orders to iteratively provide complementary knowledge to the student model.
arXiv Detail & Related papers (2022-03-08T08:08:45Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Inverse-Dirichlet Weighting Enables Reliable Training of Physics
Informed Neural Networks [2.580765958706854]
We describe and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks.
PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data.
For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.
arXiv Detail & Related papers (2021-07-02T10:01:37Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Neural Network Retraining for Model Serving [32.857847595096025]
We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference.
We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining.
arXiv Detail & Related papers (2020-04-29T13:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.