Exploring Unsupervised Pretraining Objectives for Machine Translation
- URL: http://arxiv.org/abs/2106.05634v1
- Date: Thu, 10 Jun 2021 10:18:23 GMT
- Title: Exploring Unsupervised Pretraining Objectives for Machine Translation
- Authors: Christos Baziotis, Ivan Titov, Alexandra Birch, Barry Haddow
- Abstract summary: Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
- Score: 99.5441395624651
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised cross-lingual pretraining has achieved strong results in neural
machine translation (NMT), by drastically reducing the need for large parallel
data. Most approaches adapt masked-language modeling (MLM) to
sequence-to-sequence architectures, by masking parts of the input and
reconstructing them in the decoder. In this work, we systematically compare
masking with alternative objectives that produce inputs resembling real (full)
sentences, by reordering and replacing words based on their context. We
pretrain models with different methods on English$\leftrightarrow$German,
English$\leftrightarrow$Nepali and English$\leftrightarrow$Sinhala monolingual
data, and evaluate them on NMT. In (semi-) supervised NMT, varying the
pretraining objective leads to surprisingly small differences in the finetuned
performance, whereas unsupervised NMT is much more sensitive to it. To
understand these results, we thoroughly study the pretrained models using a
series of probes and verify that they encode and use information in different
ways. We conclude that finetuning on parallel data is mostly sensitive to few
properties that are shared by most models, such as a strong decoder, in
contrast to unsupervised NMT that also requires models with strong
cross-lingual abilities.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation [48.58899349349702]
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism.
In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT.
arXiv Detail & Related papers (2022-12-17T08:34:20Z) - Universal Conditional Masked Language Pre-training for Neural Machine
Translation [29.334361879066602]
We propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora.
We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios.
arXiv Detail & Related papers (2022-03-17T10:00:33Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Unsupervised Pretraining for Neural Machine Translation Using Elastic
Weight Consolidation [0.0]
This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT)
In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data.
We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder.
arXiv Detail & Related papers (2020-10-19T11:51:45Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.