Confidence Based Bidirectional Global Context Aware Training Framework
for Neural Machine Translation
- URL: http://arxiv.org/abs/2202.13663v1
- Date: Mon, 28 Feb 2022 10:24:22 GMT
- Title: Confidence Based Bidirectional Global Context Aware Training Framework
for Neural Machine Translation
- Authors: Chulun Zhou, Fandong Meng, Jie Zhou, Min Zhang, Hongji Wang and
Jinsong Su
- Abstract summary: We propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for neural machine translation (NMT)
Our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets.
- Score: 74.99653288574892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most dominant neural machine translation (NMT) models are restricted to make
predictions only according to the local context of preceding words in a
left-to-right manner. Although many previous studies try to incorporate global
information into NMT models, there still exist limitations on how to
effectively exploit bidirectional global context. In this paper, we propose a
Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework
for NMT, where the NMT model is jointly trained with an auxiliary conditional
masked language model (CMLM). The training consists of two stages: (1)
multi-task joint training; (2) confidence based knowledge distillation. At the
first stage, by sharing encoder parameters, the NMT model is additionally
supervised by the signal from the CMLM decoder that contains bidirectional
global contexts. Moreover, at the second stage, using the CMLM as teacher, we
further pertinently incorporate bidirectional global context to the NMT model
on its unconfidently-predicted target words via knowledge distillation.
Experimental results show that our proposed CBBGCA training framework
significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on
three large-scale translation datasets, namely WMT'14 English-to-German, WMT'19
Chinese-to-English and WMT'14 English-to-French, respectively.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Active Learning for Neural Machine Translation [0.0]
We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation.
This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM), active learning least confidence based model (ALLCM) and active learning margin sampling based model (ALMSM) when translating English to Hindi.
arXiv Detail & Related papers (2022-12-30T17:04:01Z) - Multilingual Bidirectional Unsupervised Translation Through Multilingual
Finetuning and Back-Translation [23.401781865904386]
We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English.
For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English.
For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation.
arXiv Detail & Related papers (2022-09-06T21:20:41Z) - Data Selection Curriculum for Neural Machine Translation [31.55953464971441]
We introduce a two-stage curriculum training framework for NMT models.
We fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring.
We have shown that our curriculum strategies consistently demonstrate better quality (up to +2.2 BLEU improvement) and faster convergence.
arXiv Detail & Related papers (2022-03-25T19:08:30Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - Self-Learning for Zero Shot Neural Machine Translation [13.551731309506874]
This work proposes a novel zero-shot NMT modeling approach that learns without the now-standard assumption of a pivot language sharing parallel data.
Compared to unsupervised NMT, consistent improvements are observed even in a domain-mismatch setting.
arXiv Detail & Related papers (2021-03-10T09:15:19Z) - Unsupervised Bitext Mining and Translation via Self-trained Contextual
Embeddings [51.47607125262885]
We describe an unsupervised method to create pseudo-parallel corpora for machine translation (MT) from unaligned text.
We use multilingual BERT to create source and target sentence embeddings for nearest-neighbor search and adapt the model via self-training.
We validate our technique by extracting parallel sentence pairs on the BUCC 2017 bitext mining task and observe up to a 24.5 point increase (absolute) in F1 scores over previous unsupervised methods.
arXiv Detail & Related papers (2020-10-15T14:04:03Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.