Related papers: Continual Learning in Multilingual NMT via Language-Specific Embeddings

Continual Learning in Multilingual NMT via Language-Specific Embeddings

URL: http://arxiv.org/abs/2110.10478v1
Date: Wed, 20 Oct 2021 10:38:57 GMT
Title: Continual Learning in Multilingual NMT via Language-Specific Embeddings
Authors: Alexandre Berard
Abstract summary: It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
Score: 92.91823064720232
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Some additional language-specific components may be trained to improve performance (e.g., Transformer layers or adapter modules). Because the parameters of the original model are not modified, its performance on the initial languages does not degrade. We show on two sets of experiments (small-scale on TED Talks, and large-scale on ParaCrawl) that this approach performs as well or better as the more costly alternatives; and that it has excellent zero-shot performance: training on English-centric data is enough to translate between the new language and any of the initial languages.

Related papers

Adapters for Altering LLM Vocabularies: What Languages Benefit the Most? [23.83290627671739]
VocADT is a novel method for vocabulary adaptation using adapter modules that are trained to learn the optimal linear combination of existing embeddings. We find that Latin-script languages and highly fragmented languages benefit the most from vocabulary adaptation.
arXiv Detail & Related papers (2024-10-12T20:45:24Z)
Learn and Don't Forget: Adding a New Language to ASR Foundation Models [33.98622415462255]
Foundation ASR models often support many languages, e.g. 100 languages in Whisper. Fine-tuning, while simple, may degrade the accuracy of the original set. EWC offers an alternative compromise with the potential to maintain performance in specific target languages.
arXiv Detail & Related papers (2024-07-09T12:14:48Z)
Extending Multilingual Machine Translation through Imitation Learning [60.15671816513614]
Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert. We show that our approach significantly improves the translation performance between the new and the original languages. We also demonstrate that our approach is capable of solving copy and off-target problems.
arXiv Detail & Related papers (2023-11-14T21:04:03Z)
Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages [20.17308477850864]
Pre-trained multilingual language models underpin a large portion of modern NLP tools outside of English. We propose several simple techniques to replace a cross-lingual vocabulary with a compact, language-specific one.
arXiv Detail & Related papers (2023-09-09T04:27:18Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z)
WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models [3.6878069324996616]
We introduce a method -- called WECHSEL -- to transfer English models to new languages. We use WECHSEL to transfer GPT-2 and RoBERTa models to 4 other languages.
arXiv Detail & Related papers (2021-12-13T12:26:02Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language. The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model. Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.