Unsupervised Pretraining for Neural Machine Translation Using Elastic
Weight Consolidation
- URL: http://arxiv.org/abs/2010.09403v1
- Date: Mon, 19 Oct 2020 11:51:45 GMT
- Title: Unsupervised Pretraining for Neural Machine Translation Using Elastic
Weight Consolidation
- Authors: Du\v{s}an Vari\v{s} and Ond\v{r}ej Bojar
- Abstract summary: This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT)
In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data.
We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents our ongoing research of unsupervised pretraining in neural
machine translation (NMT). In our method, we initialize the weights of the
encoder and decoder with two language models that are trained with monolingual
data and then fine-tune the model on parallel data using Elastic Weight
Consolidation (EWC) to avoid forgetting of the original language modeling
tasks. We compare the regularization by EWC with the previous work that focuses
on regularization by language modeling objectives. The positive result is that
using EWC with the decoder achieves BLEU scores similar to the previous work.
However, the model converges 2-3 times faster and does not require the original
unlabeled training data during the fine-tuning stage. In contrast, the
regularization using EWC is less effective if the original and new tasks are
not closely related. We show that initializing the bidirectional NMT encoder
with a left-to-right language model and forcing the model to remember the
original left-to-right language modeling task limits the learning capacity of
the encoder for the whole bidirectional context.
Related papers
- Efficient Machine Translation with a BiLSTM-Attention Approach [0.0]
This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model.
The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence.
Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset.
arXiv Detail & Related papers (2024-10-29T01:12:50Z) - Universal Conditional Masked Language Pre-training for Neural Machine
Translation [29.334361879066602]
We propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora.
We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios.
arXiv Detail & Related papers (2022-03-17T10:00:33Z) - Integrated Training for Sequence-to-Sequence Models Using
Non-Autoregressive Transformer [49.897891031932545]
We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation.
We conduct an evaluation on two pivot-based machine translation tasks, namely French-German and German-Czech.
arXiv Detail & Related papers (2021-09-27T11:04:09Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z) - Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New
Approach Using XLM-RoBERTa Alignment [5.747195707763152]
We build an entity alignment model on top of XLM-RoBERTa to project the entities detected on the English part of the parallel data to the target language sentences.
Unlike using translation methods, this approach benefits from natural fluency and nuances in target-language original corpus.
We evaluate this proposed approach over 4 target languages on benchmark data sets and got competitive F1 scores compared to most recent SOTA models.
arXiv Detail & Related papers (2021-01-26T22:19:52Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language.
The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model.
Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z) - Universal Vector Neural Machine Translation With Effective Attention [0.0]
We propose a singular model for Neural Machine Translation based on encoder-decoder models.
We introduce a neutral/universal model representation that can be used to predict more than one language.
arXiv Detail & Related papers (2020-06-09T01:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.