Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT
- URL: http://arxiv.org/abs/2009.07610v3
- Date: Tue, 6 Oct 2020 13:54:47 GMT
- Title: Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT
- Authors: Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser
- Abstract summary: We present an effective approach that reuses an LM that is pretrained only on the high-resource language.
The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model.
Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
- Score: 129.99918589405675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using a language model (LM) pretrained on two languages with large
monolingual data in order to initialize an unsupervised neural machine
translation (UNMT) system yields state-of-the-art results. When limited data is
available for one language, however, this method leads to poor translations. We
present an effective approach that reuses an LM that is pretrained only on the
high-resource language. The monolingual LM is fine-tuned on both languages and
is then used to initialize a UNMT model. To reuse the pretrained LM, we have to
modify its predefined vocabulary, to account for the new language. We therefore
propose a novel vocabulary extension method. Our approach, RE-LM, outperforms a
competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk)
and English-Albanian (En-Sq), yielding more than +8.3 BLEU points for all four
translation directions.
Related papers
- An Efficient Multilingual Language Model Compression through Vocabulary
Trimming [16.568276582466833]
vocabulary-trimming (VT) is a method to reduce a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary.
In our experiments, we show that VT can retain the original performance of the multilingual LM, while being smaller in size.
This methodology can keep the best of both monolingual and multilingual worlds by keeping a small size as monolingual models.
arXiv Detail & Related papers (2023-05-24T11:00:33Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - MergeDistill: Merging Pre-trained Language Models using Distillation [5.396915402673246]
We propose MergeDistill, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies.
We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity.
arXiv Detail & Related papers (2021-06-05T08:22:05Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Leveraging Monolingual Data with Self-Supervision for Multilingual
Neural Machine Translation [54.52971020087777]
Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models.
Self-supervision improves zero-shot translation quality in multilingual models.
We get up to 33 BLEU on ro-en translation without any parallel data or back-translation.
arXiv Detail & Related papers (2020-05-11T00:20:33Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.