Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
- URL: http://arxiv.org/abs/2406.16135v1
- Date: Sun, 23 Jun 2024 15:15:17 GMT
- Title: Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
- Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang,
- Abstract summary: Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
- Score: 62.91524967852552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation and embedding space analyses, they struggle with deeper crosslingual knowledge transfer, revealing a crosslingual knowledge barrier in both general (MMLU benchmark) and domain-specific (Harry Potter quiz) contexts. We observe that simple inference-time mitigation methods offer only limited improvement. On the other hand, we propose fine-tuning of LLMs on mixed-language data, which effectively reduces these gaps, even when using out-of-domain datasets like WikiText. Our findings suggest the need for explicit optimization to unlock the full crosslingual potential of LLMs. Our code is publicly available at https://github.com/google-research/crosslingual-knowledge-barriers.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models [22.859955360764275]
We introduce the MultiLingual Needle-in-a-Haystack (MLNeedle) test to assess a model's ability to retrieve relevant information.
We evaluate four state-of-the-art large language models on MLNeedle.
arXiv Detail & Related papers (2024-08-19T17:02:06Z) - Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation [25.850573463743352]
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks.
Yet significant performance disparities exist across different languages within the same mPLM.
We introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM.
arXiv Detail & Related papers (2024-04-12T14:19:16Z) - UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised
Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields.
In this work, we construct an open-source multilingual supervised fine-tuning dataset.
The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z) - Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? [40.13166574854085]
We investigate the minimal amount of multilinguality required to elicit cross-lingual generalisation in English-centric large language models.
We find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation.
arXiv Detail & Related papers (2023-12-20T00:49:52Z) - How Vocabulary Sharing Facilitates Multilingualism in LLaMA? [19.136382859468693]
Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages.
This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
arXiv Detail & Related papers (2023-11-15T16:13:14Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.