Universal Conditional Masked Language Pre-training for Neural Machine
Translation
- URL: http://arxiv.org/abs/2203.09210v2
- Date: Sun, 20 Mar 2022 12:11:41 GMT
- Title: Universal Conditional Masked Language Pre-training for Neural Machine
Translation
- Authors: Pengfei Li, Liangyou Li, Meng Zhang, Minghao Wu, Qun Liu
- Abstract summary: We propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora.
We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios.
- Score: 29.334361879066602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained sequence-to-sequence models have significantly improved Neural
Machine Translation (NMT). Different from prior works where pre-trained models
usually adopt an unidirectional decoder, this paper demonstrates that
pre-training a sequence-to-sequence model but with a bidirectional decoder can
produce notable performance gains for both Autoregressive and
Non-autoregressive NMT. Specifically, we propose CeMAT, a conditional masked
language model pre-trained on large-scale bilingual and monolingual corpora in
many languages. We also introduce two simple but effective methods to enhance
the CeMAT, aligned code-switching & masking and dynamic dual-masking. We
conduct extensive experiments and show that our CeMAT can achieve significant
performance improvement for all scenarios from low- to extremely high-resource
languages, i.e., up to +14.4 BLEU on low resource and +7.9 BLEU improvements on
average for Autoregressive NMT. For Non-autoregressive NMT, we demonstrate it
can also produce consistent performance gains, i.e., up to +5.3 BLEU. To the
best of our knowledge, this is the first work to pre-train a unified model for
fine-tuning on both NMT tasks. Code, data, and pre-trained models are available
at https://github.com/huawei-noah/Pretrained-Language-Model/CeMAT
Related papers
- Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation [48.58899349349702]
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism.
In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT.
arXiv Detail & Related papers (2022-12-17T08:34:20Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Unsupervised Pretraining for Neural Machine Translation Using Elastic
Weight Consolidation [0.0]
This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT)
In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data.
We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder.
arXiv Detail & Related papers (2020-10-19T11:51:45Z) - Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
Machine Translation [50.0258495437314]
We investigate the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT)
For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings.
For mBART we match or outperform the performance of naive fine-tuning for most language pairs with the encoder, and most of the decoder, frozen.
arXiv Detail & Related papers (2020-04-30T16:09:22Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.