Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced
Training for Neural Machine Translation
- URL: http://arxiv.org/abs/2203.03910v1
- Date: Tue, 8 Mar 2022 08:08:45 GMT
- Title: Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced
Training for Neural Machine Translation
- Authors: Chenze Shao, Yang Feng
- Abstract summary: Neural networks tend to forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions.
This problem is called textitcatastrophic forgetting, which is a fundamental challenge in the continual learning of neural networks.
We propose Complementary Online Knowledge Distillation (COKD), which uses dynamically updated teacher models trained on specific data orders to iteratively provide complementary knowledge to the student model.
- Score: 15.309573393914462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks tend to gradually forget the previously learned knowledge
when learning multiple tasks sequentially from dynamic data distributions. This
problem is called \textit{catastrophic forgetting}, which is a fundamental
challenge in the continual learning of neural networks. In this work, we
observe that catastrophic forgetting not only occurs in continual learning but
also affects the traditional static training. Neural networks, especially
neural machine translation models, suffer from catastrophic forgetting even if
they learn from a static training set. To be specific, the final model pays
imbalanced attention to training samples, where recently exposed samples
attract more attention than earlier samples. The underlying cause is that
training samples do not get balanced training in each model update, so we name
this problem \textit{imbalanced training}. To alleviate this problem, we
propose Complementary Online Knowledge Distillation (COKD), which uses
dynamically updated teacher models trained on specific data orders to
iteratively provide complementary knowledge to the student model. Experimental
results on multiple machine translation tasks show that our method successfully
alleviates the problem of imbalanced training and achieves substantial
improvements over strong baseline systems.
Related papers
- Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z) - Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural
Networks [12.525959293825318]
We introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for deep neural networks (DNNs)
LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model, and the relearning phase, which emphasizes learning on generalizable features.
We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings.
arXiv Detail & Related papers (2023-03-18T16:45:54Z) - DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning [5.2319020651074215]
We propose a Curricumum-guided Contrastive Learning framework for neural Predictor (DCLP)
Our method simplifies the contrastive task by designing a novel curriculum to enhance the stability of unlabeled training data distribution.
We experimentally demonstrate that DCLP has high accuracy and efficiency compared with existing predictors.
arXiv Detail & Related papers (2023-02-25T08:16:21Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Explain to Not Forget: Defending Against Catastrophic Forgetting with
XAI [10.374979214803805]
Catastrophic forgetting describes the phenomenon when a neural network completely forgets previous knowledge when given new information.
We propose a novel training algorithm called training by explaining in which we leverage Layer-wise Relevance propagation in order to retain the information a neural network has already learned in previous tasks when training on new data.
Our method not only successfully retains the knowledge of old tasks within the neural networks but does so more resource-efficiently than other state-of-the-art solutions.
arXiv Detail & Related papers (2022-05-04T08:00:49Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Neural Network Retraining for Model Serving [32.857847595096025]
We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference.
We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining.
arXiv Detail & Related papers (2020-04-29T13:52:28Z) - Frosting Weights for Better Continual Training [22.554993259239307]
Training a neural network model can be a lifelong learning process and is a computationally intensive one.
Deep neural network models can suffer from catastrophic forgetting during retraining on new data.
We propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the problem.
arXiv Detail & Related papers (2020-01-07T00:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.