An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models
- URL: http://arxiv.org/abs/2403.20088v1
- Date: Fri, 29 Mar 2024 09:52:18 GMT
- Title: An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models
- Authors: Fahim Faisal, Antonios Anastasopoulos,
- Abstract summary: We propose an textitefficient method to study transfer language influence in zero-shot performance on another target language.
Our findings suggest that some languages do not largely affect others while some languages, especially ones unseen during pre-training, can be extremely beneficial or detrimental for different target languages.
- Score: 26.72394783468532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The capacity and effectiveness of pre-trained multilingual models (MLMs) for zero-shot cross-lingual transfer is well established. However, phenomena of positive or negative transfer, and the effect of language choice still need to be fully understood, especially in the complex setting of massively multilingual LMs. We propose an \textit{efficient} method to study transfer language influence in zero-shot performance on another target language. Unlike previous work, our approach disentangles downstream tasks from language, using dedicated adapter units. Our findings suggest that some languages do not largely affect others, while some languages, especially ones unseen during pre-training, can be extremely beneficial or detrimental for different target languages. We find that no transfer language is beneficial for all target languages. We do, curiously, observe languages previously unseen by MLMs consistently benefit from transfer from almost any language. We additionally use our modular approach to quantify negative interference efficiently and categorize languages accordingly. Furthermore, we provide a list of promising transfer-target language configurations that consistently lead to target language performance improvements. Code and data are publicly available: https://github.com/ffaisal93/neg_inf
Related papers
- Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation [21.980770995466134]
We introduce LEIA, a language adaptation tuning method that utilizes Wikipedia entity names aligned across languages.
This method involves augmenting the target language corpus with English entity names and training the model using left-to-right language modeling.
arXiv Detail & Related papers (2024-02-18T07:24:34Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity
Recognition [0.10641561702689348]
We investigate the properties of cross-lingual transfer learning between ten low-resourced languages.
We find that models that perform well on a single language often do so at the expense of generalising to others.
The amount of data overlap between the source and target datasets is a better predictor of transfer performance than either the geographical or genetic distance between the languages.
arXiv Detail & Related papers (2023-09-11T08:56:47Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Detecting Languages Unintelligible to Multilingual Models through Local
Structure Probes [15.870989191524094]
We develop a general approach that requires only unlabelled text to detect which languages are not well understood by a cross-lingual model.
Our approach is derived from the hypothesis that if a model's understanding is insensitive to perturbations to text in a language, it is likely to have a limited understanding of that language.
arXiv Detail & Related papers (2022-11-09T16:45:16Z) - Lifting the Curse of Multilinguality by Pre-training Modular
Transformers [72.46919537293068]
multilingual pre-trained models suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
We introduce language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant.
Our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
arXiv Detail & Related papers (2022-05-12T17:59:56Z) - Multilingual transfer of acoustic word embeddings improves when training
on languages related to the target zero-resource language [32.170748231414365]
We show that training on even just a single related language gives the largest gain.
We also find that adding data from unrelated languages generally doesn't hurt performance.
arXiv Detail & Related papers (2021-06-24T08:37:05Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.