Match the Script, Adapt if Multilingual: Analyzing the Effect of
Multilingual Pretraining on Cross-lingual Transferability
- URL: http://arxiv.org/abs/2203.10753v1
- Date: Mon, 21 Mar 2022 06:52:38 GMT
- Title: Match the Script, Adapt if Multilingual: Analyzing the Effect of
Multilingual Pretraining on Cross-lingual Transferability
- Authors: Yoshinari Fujinuma, Jordan Boyd-Graber, Katharina Kann
- Abstract summary: Pretrained multilingual models enable zero-shot learning even for unseen languages.
It is unclear how the number of pretraining languages influences a model's zero-shot learning for languages unseen during pretraining.
- Score: 26.553524219316188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained multilingual models enable zero-shot learning even for unseen
languages, and that performance can be further improved via adaptation prior to
finetuning. However, it is unclear how the number of pretraining languages
influences a model's zero-shot learning for languages unseen during
pretraining. To fill this gap, we ask the following research questions: (1) How
does the number of pretraining languages influence zero-shot performance on
unseen target languages? (2) Does the answer to that question change with model
adaptation? (3) Do the findings for our first question change if the languages
used for pretraining are all related? Our experiments on pretraining with
related languages indicate that choosing a diverse set of languages is crucial.
Without model adaptation, surprisingly, increasing the number of pretraining
languages yields better results up to adding related languages, after which
performance plateaus. In contrast, with model adaptation via continued
pretraining, pretraining on a larger number of languages often gives further
improvement, suggesting that model adaptation is crucial to exploit additional
pretraining languages.
Related papers
- PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment [68.20851615263953]
Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining.
The spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing.
We propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining.
arXiv Detail & Related papers (2024-07-23T06:59:53Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
Pre-Training for Adaptation to Unseen Languages [40.41642013737395]
Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.
We aim to understand which model adapts better to languages unseen during pre-training.
We fine-tune both models on 13 unseen languages and 18 seen languages.
arXiv Detail & Related papers (2023-05-21T23:53:12Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Phylogeny-Inspired Adaptation of Multilingual Models to New Languages [43.62238334380897]
We show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages.
We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks.
arXiv Detail & Related papers (2022-05-19T15:49:19Z) - Lifting the Curse of Multilinguality by Pre-training Modular
Transformers [72.46919537293068]
multilingual pre-trained models suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
We introduce language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant.
Our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
arXiv Detail & Related papers (2022-05-12T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.