Understanding Learning Dynamics for Neural Machine Translation
- URL: http://arxiv.org/abs/2004.02199v1
- Date: Sun, 5 Apr 2020 13:32:58 GMT
- Title: Understanding Learning Dynamics for Neural Machine Translation
- Authors: Conghui Zhu, Guanlin Li, Lemao Liu, Tiejun Zhao, Shuming Shi
- Abstract summary: We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation.
As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario.
Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
- Score: 53.23463279153577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the great success of NMT, there still remains a severe challenge: it
is hard to interpret the internal dynamics during its training process. In this
paper we propose to understand learning dynamics of NMT by using a recent
proposed technique named Loss Change Allocation
(LCA)~\citep{lan-2019-loss-change-allocation}. As LCA requires calculating the
gradient on an entire dataset for each update, we instead present an
approximate to put it into practice in NMT scenario. %motivated by the lesson
from sgd. Our simulated experiment shows that such approximate calculation is
efficient and is empirically proved to deliver consistent results to the
brute-force implementation. In particular, extensive experiments on two
standard translation benchmark datasets reveal some valuable findings.
Related papers
- Importance-Aware Data Augmentation for Document-Level Neural Machine
Translation [51.74178767827934]
Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive.
Due to its longer input length and limited availability of training data, DocNMT often faces the challenge of data sparsity.
We propose a novel Importance-Aware Data Augmentation (IADA) algorithm for DocNMT that augments the training data based on token importance information estimated by the norm of hidden states and training gradients.
arXiv Detail & Related papers (2024-01-27T09:27:47Z) - Code-Switching with Word Senses for Pretraining in Neural Machine
Translation [107.23743153715799]
We introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT)
WSP-NMT is an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases.
Our experiments show significant improvements in overall translation quality.
arXiv Detail & Related papers (2023-10-21T16:13:01Z) - Nearest Neighbor Machine Translation is Meta-Optimizer on Output
Projection Layer [44.02848852485475]
Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks.
We comprehensively analyze $k$NN-MT through theoretical and empirical studies.
arXiv Detail & Related papers (2023-05-22T13:38:53Z) - Towards Reliable Neural Machine Translation with Consistency-Aware
Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability.
We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it.
We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z) - Nearest Neighbor Knowledge Distillation for Neural Machine Translation [50.0624778757462]
k-nearest-neighbor machine translation (NN-MT) has achieved many state-of-the-art results in machine translation tasks.
NN-KD trains the base NMT model to directly learn the knowledge of NN.
arXiv Detail & Related papers (2022-05-01T14:30:49Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Dynamic Curriculum Learning for Low-Resource Neural Machine Translation [27.993407441922507]
We investigate the effective use of training data for low-resource NMT.
In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training.
This eases training by highlighting easy samples that the current model has enough competence to learn.
arXiv Detail & Related papers (2020-11-30T08:13:41Z) - Unsupervised Neural Machine Translation for Low-Resource Domains via
Meta-Learning [27.86606560170401]
We present a novel meta-learning algorithm for unsupervised neural machine translation (UNMT)
We train the model to adapt to another domain by utilizing only a small amount of training data.
Our model surpasses a transfer learning-based approach by up to 2-4 BLEU scores.
arXiv Detail & Related papers (2020-10-18T17:54:13Z) - Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training.
We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.