Norm-Based Curriculum Learning for Neural Machine Translation
- URL: http://arxiv.org/abs/2006.02014v1
- Date: Wed, 3 Jun 2020 02:22:00 GMT
- Title: Norm-Based Curriculum Learning for Neural Machine Translation
- Authors: Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao
- Abstract summary: A neural machine translation (NMT) system is expensive to train, especially with high-resource settings.
In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method.
The proposed method outperforms strong baselines in terms of BLEU score (+1.17/+1.56) and training speedup (2.22x/3.33x)
- Score: 45.37588885850862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A neural machine translation (NMT) system is expensive to train, especially
with high-resource settings. As the NMT architectures become deeper and wider,
this issue gets worse and worse. In this paper, we aim to improve the
efficiency of training an NMT by introducing a novel norm-based curriculum
learning method. We use the norm (aka length or module) of a word embedding as
a measure of 1) the difficulty of the sentence, 2) the competence of the model,
and 3) the weight of the sentence. The norm-based sentence difficulty takes the
advantages of both linguistically motivated and model-based sentence
difficulties. It is easy to determine and contains learning-dependent features.
The norm-based model competence makes NMT learn the curriculum in a fully
automated way, while the norm-based sentence weight further enhances the
learning of the vector representation of the NMT. Experimental results for the
WMT'14 English-German and WMT'17 Chinese-English translation tasks demonstrate
that the proposed method outperforms strong baselines in terms of BLEU score
(+1.17/+1.56) and training speedup (2.22x/3.33x).
Related papers
- Towards Reliable Neural Machine Translation with Consistency-Aware
Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability.
We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it.
We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Phrase-level Active Learning for Neural Machine Translation [107.28450614074002]
We propose an active learning setting where we can spend a given budget on translating in-domain data.
We select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators.
In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods.
arXiv Detail & Related papers (2021-06-21T19:20:42Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - Better Neural Machine Translation by Extracting Linguistic Information
from BERT [4.353029347463806]
Adding linguistic information to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models.
We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates.
arXiv Detail & Related papers (2021-04-07T00:03:51Z) - Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training.
We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z) - Can Automatic Post-Editing Improve NMT? [9.233407096706744]
Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort.
APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems.
arXiv Detail & Related papers (2020-09-30T02:34:19Z) - Assessing the Bilingual Knowledge Learned by Neural Machine Translation
Models [72.56058378313963]
We bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table.
We find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples.
arXiv Detail & Related papers (2020-04-28T03:44:34Z) - Learning to Multi-Task Learn for Better Neural Machine Translation [53.06405021125476]
Multi-task learning is an elegant approach to inject linguistic-related biases into neural machine translation models.
We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the biased-MTL setting of interest.
Experiments show the resulting automatically learned training schedulers are competitive with the best, and lead to up to +1.1 BLEU score improvements.
arXiv Detail & Related papers (2020-01-10T03:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.