What Causes Knowledge Loss in Multilingual Language Models?
- URL: http://arxiv.org/abs/2504.20356v1
- Date: Tue, 29 Apr 2025 01:49:09 GMT
- Title: What Causes Knowledge Loss in Multilingual Language Models?
- Authors: Maria Khelli, Samuel Cahyawijaya, Ayu Purwarianti, Genta Indra Winata,
- Abstract summary: Cross-lingual transfer in natural language processing (NLP) models enhances multilingual performance by leveraging shared linguistic knowledge.<n>Traditional methods that process all data simultaneously often fail to mimic real-world scenarios, leading to challenges like catastrophic forgetting.<n>Our study explores this issue in multilingual contexts, focusing on linguistic differences affecting representational learning rather than just model parameters.
- Score: 26.094144160398443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-lingual transfer in natural language processing (NLP) models enhances multilingual performance by leveraging shared linguistic knowledge. However, traditional methods that process all data simultaneously often fail to mimic real-world scenarios, leading to challenges like catastrophic forgetting, where fine-tuning on new tasks degrades performance on previously learned ones. Our study explores this issue in multilingual contexts, focusing on linguistic differences affecting representational learning rather than just model parameters. We experiment with 52 languages using LoRA adapters of varying ranks to evaluate non-shared, partially shared, and fully shared parameters. Our aim is to see if parameter sharing through adapters can mitigate forgetting while preserving prior knowledge. We find that languages using non-Latin scripts are more susceptible to catastrophic forgetting, whereas those written in Latin script facilitate more effective cross-lingual transfer.
Related papers
- Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Unknown Script: Impact of Script on Cross-Lingual Transfer [2.5398014196797605]
Cross-lingual transfer has become an effective way of transferring knowledge between languages.
We consider a case where the target language and its script are not part of the pre-trained model.
Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.
arXiv Detail & Related papers (2024-04-29T15:48:01Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies [4.118037156777793]
Cross-lingual transfer is a promising technique to solve tasks in less-resourced languages.<n>We compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models.
arXiv Detail & Related papers (2023-09-12T09:37:08Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Language Agnostic Multilingual Information Retrieval with Contrastive
Learning [59.26316111760971]
We present an effective method to train multilingual information retrieval systems.
We leverage parallel and non-parallel corpora to improve the pretrained multilingual language models.
Our model can work well even with a small number of parallel sentences.
arXiv Detail & Related papers (2022-10-12T23:53:50Z) - Improving Zero-Shot Multi-Lingual Entity Linking [14.502266106371433]
We consider multilingual entity linking, where a single model is trained to link references to same-language knowledge bases in several languages.
We propose a neural ranker architecture, which leverages multilingual transformer representations of text to be easily applied to a multilingual setting.
We find that using this approach improves recall in several datasets, often matching the in-language performance.
arXiv Detail & Related papers (2021-04-16T12:50:07Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Zero-Shot Cross-Lingual Transfer with Meta Learning [45.29398184889296]
We consider the setting of training models on multiple languages at the same time, when little or no data is available for languages other than English.
We show that this challenging setup can be approached using meta-learning.
We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks.
arXiv Detail & Related papers (2020-03-05T16:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.