On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment
- URL: http://arxiv.org/abs/2010.03017v1
- Date: Tue, 6 Oct 2020 20:48:58 GMT
- Title: On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment
- Authors: Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov
- Abstract summary: We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
- Score: 59.995385574274785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern multilingual models are trained on concatenated text from multiple
languages in hopes of conferring benefits to each (positive transfer), with the
most pronounced benefits accruing to low-resource languages. However, recent
work has shown that this approach can degrade performance on high-resource
languages, a phenomenon known as negative interference. In this paper, we
present the first systematic study of negative interference. We show that,
contrary to previous belief, negative interference also impacts low-resource
languages. While parameters are maximally shared to learn language-universal
structures, we demonstrate that language-specific parameters do exist in
multilingual models and they are a potential cause of negative interference.
Motivated by these observations, we also present a meta-learning algorithm that
obtains better cross-lingual transferability and alleviates negative
interference, by adding language-specific layers as meta-parameters and
training them in a manner that explicitly improves shared layers'
generalization on all languages. Overall, our results show that negative
interference is more common than previously known, suggesting new directions
for improving multilingual representations.
Related papers
- No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Understanding the effects of language-specific class imbalance in
multilingual fine-tuning [0.0]
We show that fine-tuning a transformer-based Large Language Model (LLM) on a dataset with an imbalance leads to worse performance.
We modify the traditional class weighing approach to imbalance by calculating class weights separately for each language.
arXiv Detail & Related papers (2024-02-20T13:59:12Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Leveraging Multi-lingual Positive Instances in Contrastive Learning to
Improve Sentence Embedding [17.12010497289781]
We argue that leveraging multiple positives should be considered for multi-lingual sentence embeddings.
We propose a novel approach, named MPCL, to effectively utilize multiple positive instances to improve the learning of multi-lingual sentence embeddings.
arXiv Detail & Related papers (2023-09-16T08:54:30Z) - Shapley Head Pruning: Identifying and Removing Interference in
Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters.
We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Lifting the Curse of Multilinguality by Pre-training Modular
Transformers [72.46919537293068]
multilingual pre-trained models suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
We introduce language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant.
Our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
arXiv Detail & Related papers (2022-05-12T17:59:56Z) - On the Language-specificity of Multilingual BERT and the Impact of
Fine-tuning [7.493779672689531]
The knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one.
This paper analyses the relationship between them, in the context of fine-tuning on two tasks.
arXiv Detail & Related papers (2021-09-14T19:28:31Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference.
We propose an adaptive and sparse architecture for multilingual modeling.
Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.