Related papers: Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT

Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT

URL: http://arxiv.org/abs/2009.07610v3
Date: Tue, 6 Oct 2020 13:54:47 GMT
Title: Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT
Authors: Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser
Abstract summary: We present an effective approach that reuses an LM that is pretrained only on the high-resource language. The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model. Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
Score: 129.99918589405675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using a language model (LM) pretrained on two languages with large monolingual data in order to initialize an unsupervised neural machine translation (UNMT) system yields state-of-the-art results. When limited data is available for one language, however, this method leads to poor translations. We present an effective approach that reuses an LM that is pretrained only on the high-resource language. The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model. To reuse the pretrained LM, we have to modify its predefined vocabulary, to account for the new language. We therefore propose a novel vocabulary extension method. Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq), yielding more than +8.3 BLEU points for all four translation directions.

Related papers

An Efficient Multilingual Language Model Compression through Vocabulary Trimming [16.568276582466833]
vocabulary-trimming (VT) is a method to reduce a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. In our experiments, we show that VT can retain the original performance of the multilingual LM, while being smaller in size. This methodology can keep the best of both monolingual and multilingual worlds by keeping a small size as monolingual models.
arXiv Detail & Related papers (2023-05-24T11:00:33Z)
Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z)
MergeDistill: Merging Pre-trained Language Models using Distillation [5.396915402673246]
We propose MergeDistill, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies. We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity.
arXiv Detail & Related papers (2021-06-05T08:22:05Z)
Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages. Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline. Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z)
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages. Previous research has shown that this is because the representations are not sufficiently aligned. In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z)
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation [54.52971020087777]
Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models. Self-supervision improves zero-shot translation quality in multilingual models. We get up to 33 BLEU on ro-en translation without any parallel data or back-translation.
arXiv Detail & Related papers (2020-05-11T00:20:33Z)
Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM) We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.