Lifting the Curse of Multilinguality by Pre-training Modular
Transformers
- URL: http://arxiv.org/abs/2205.06266v1
- Date: Thu, 12 May 2022 17:59:56 GMT
- Title: Lifting the Curse of Multilinguality by Pre-training Modular
Transformers
- Authors: Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross,
Sebastian Riedel, Mikel Artetxe
- Abstract summary: multilingual pre-trained models suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
We introduce language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant.
Our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
- Score: 72.46919537293068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual pre-trained models are known to suffer from the curse of
multilinguality, which causes per-language performance to drop as they cover
more languages. We address this issue by introducing language-specific modules,
which allows us to grow the total capacity of the model, while keeping the
total number of trainable parameters per language constant. In contrast with
prior work that learns language-specific components post-hoc, we pre-train the
modules of our Cross-lingual Modular (X-Mod) models from the start. Our
experiments on natural language inference, named entity recognition and
question answering show that our approach not only mitigates the negative
interference between languages, but also enables positive transfer, resulting
in improved monolingual and cross-lingual performance. Furthermore, our
approach enables adding languages post-hoc with no measurable drop in
performance, no longer limiting the model usage to the set of pre-trained
languages.
Related papers
- Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques.
Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z) - LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language [2.9914612342004503]
This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages.
We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension.
The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks.
arXiv Detail & Related papers (2024-05-13T13:41:59Z) - No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - On the ability of monolingual models to learn language-agnostic
representations [2.604227467422371]
We show that monolingual models pretrained and finetuned on different languages achieve competitive performance.
For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.
arXiv Detail & Related papers (2021-09-04T22:09:44Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.