Related papers: Understanding Learning Dynamics for Neural Machine Translation

Understanding Learning Dynamics for Neural Machine Translation

URL: http://arxiv.org/abs/2004.02199v1
Date: Sun, 5 Apr 2020 13:32:58 GMT
Title: Understanding Learning Dynamics for Neural Machine Translation
Authors: Conghui Zhu, Guanlin Li, Lemao Liu, Tiejun Zhao, Shuming Shi
Abstract summary: We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation. As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario. Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
Score: 53.23463279153577
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the great success of NMT, there still remains a severe challenge: it is hard to interpret the internal dynamics during its training process. In this paper we propose to understand learning dynamics of NMT by using a recent proposed technique named Loss Change Allocation (LCA)~\citep{lan-2019-loss-change-allocation}. As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario. %motivated by the lesson from sgd. Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results to the brute-force implementation. In particular, extensive experiments on two standard translation benchmark datasets reveal some valuable findings.

Related papers

Importance-Aware Data Augmentation for Document-Level Neural Machine Translation [51.74178767827934]
Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive. Due to its longer input length and limited availability of training data, DocNMT often faces the challenge of data sparsity. We propose a novel Importance-Aware Data Augmentation (IADA) algorithm for DocNMT that augments the training data based on token importance information estimated by the norm of hidden states and training gradients.
arXiv Detail & Related papers (2024-01-27T09:27:47Z)
Code-Switching with Word Senses for Pretraining in Neural Machine Translation [107.23743153715799]
We introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT) WSP-NMT is an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases. Our experiments show significant improvements in overall translation quality.
arXiv Detail & Related papers (2023-10-21T16:13:01Z)
Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer [44.02848852485475]
Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks. We comprehensively analyze $k$NN-MT through theoretical and empirical studies.
arXiv Detail & Related papers (2023-05-22T13:38:53Z)
Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability. We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it. We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z)
Nearest Neighbor Knowledge Distillation for Neural Machine Translation [50.0624778757462]
k-nearest-neighbor machine translation (NN-MT) has achieved many state-of-the-art results in machine translation tasks. NN-KD trains the base NMT model to directly learn the knowledge of NN.
arXiv Detail & Related papers (2022-05-01T14:30:49Z)
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT) CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z)
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation [27.993407441922507]
We investigate the effective use of training data for low-resource NMT. In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training. This eases training by highlighting easy samples that the current model has enough competence to learn.
arXiv Detail & Related papers (2020-11-30T08:13:41Z)
Unsupervised Neural Machine Translation for Low-Resource Domains via Meta-Learning [27.86606560170401]
We present a novel meta-learning algorithm for unsupervised neural machine translation (UNMT) We train the model to adapt to another domain by utilizing only a small amount of training data. Our model surpasses a transfer learning-based approach by up to 2-4 BLEU scores.
arXiv Detail & Related papers (2020-10-18T17:54:13Z)
Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training. We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.