Self-Guided Curriculum Learning for Neural Machine Translation
- URL: http://arxiv.org/abs/2105.04475v1
- Date: Mon, 10 May 2021 16:12:14 GMT
- Title: Self-Guided Curriculum Learning for Neural Machine Translation
- Authors: Lei Zhou, Liang Ding, Kevin Duh, Ryohei Sasano, Koichi Takeda
- Abstract summary: We propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models.
Our approach can consistently improve translation performance against strong baseline Transformer.
- Score: 25.870500301724128
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In the field of machine learning, the well-trained model is assumed to be
able to recover the training labels, i.e. the synthetic labels predicted by the
model should be as close to the ground-truth labels as possible. Inspired by
this, we propose a self-guided curriculum strategy to encourage the learning of
neural machine translation (NMT) models to follow the above recovery criterion,
where we cast the recovery degree of each training example as its learning
difficulty. Specifically, we adopt the sentence level BLEU score as the proxy
of recovery degree. Different from existing curricula relying on linguistic
prior knowledge or third-party language models, our chosen learning difficulty
is more suitable to measure the degree of knowledge mastery of the NMT models.
Experiments on translation benchmarks, including WMT14
English$\Rightarrow$German and WMT17 Chinese$\Rightarrow$English, demonstrate
that our approach can consistently improve translation performance against
strong baseline Transformer.
Related papers
- On the Shortcut Learning in Multilingual Neural Machine Translation [95.30470845501141]
This study revisits the commonly-cited off-target issue in multilingual neural machine translation (MNMT)
We attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings.
Analyses on learning dynamics show that the shortcut learning generally occurs in the later stage of model training.
arXiv Detail & Related papers (2024-11-15T21:09:36Z) - Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking [1.3716808114696444]
Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages.
This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations.
arXiv Detail & Related papers (2024-05-07T21:58:45Z) - MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation [61.65537912700187]
Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT)
We propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner.
arXiv Detail & Related papers (2024-03-14T16:07:39Z) - Active Learning for Neural Machine Translation [0.0]
We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation.
This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM), active learning least confidence based model (ALLCM) and active learning margin sampling based model (ALMSM) when translating English to Hindi.
arXiv Detail & Related papers (2022-12-30T17:04:01Z) - Continual Knowledge Distillation for Neural Machine Translation [74.03622486218597]
parallel corpora are not publicly accessible for data copyright, data privacy and competitive differentiation reasons.
We propose a method called continual knowledge distillation to take advantage of existing translation models to improve one model of interest.
arXiv Detail & Related papers (2022-12-18T14:41:13Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training.
We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.