Related papers: Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models

Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models

URL: http://arxiv.org/abs/2505.18673v1
Date: Sat, 24 May 2025 12:31:27 GMT
Title: Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models
Authors: Zixiang Xu, Yanbo Wang, Yue Huang, Xiuying Chen, Jieyu Zhao, Meng Jiang, Xiangliang Zhang,
Abstract summary: This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
Score: 55.14276067678253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have achieved remarkable success in Natural Language Processing (NLP), yet their cross-lingual performance consistency remains a significant challenge. This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in LLMs. Our approach leverages beam search and LLM-based simulation to generate bilingual question pairs that expose performance discrepancies between English and target languages. We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models. The extensive experiments demonstrate that our method precisely and cost-effectively pinpoints cross-lingual weaknesses, consistently revealing over 50\% accuracy drops in target languages across a wide range of models. Moreover, further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns and benefit from targeted post-training. Code is available at https://github.com/xzx34/Cross-Lingual-Pitfalls.

Related papers

MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages [33.450081592217074]
We introduce MuBench, a benchmark covering 61 languages and evaluating a broad range of capabilities.<n>We evaluate several state-of-the-art multilingual LLMs and find notable gaps between claimed and actual language coverage.
arXiv Detail & Related papers (2025-06-24T09:53:00Z)
Less Data Less Tokens: Multilingual Unification Learning for Efficient Test-Time Reasoning in LLMs [13.618284161265123]
This paper explores the challenges of test-time scaling of large language models (LLMs)<n>We highlight the diversity of multi-lingual reasoning based on our pilot studies.<n>We introduce a novel approach, (L2) multi-lingual unification learning.
arXiv Detail & Related papers (2025-06-23T06:47:28Z)
Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples [38.18495961129682]
This paper introduces a novel cross-lingual search task that does not require a large semantic corpus.<n>It focuses on the ability of a model to cross-lingually rank the true parallel sentence higher than challenging distractors generated by a large language model.<n>We create a case study of our introduced CLSD task for the language pair German-French in the news domain.
arXiv Detail & Related papers (2025-02-12T18:54:37Z)
Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models [16.942897938964638]
Large Language Models (LLMs) have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.
arXiv Detail & Related papers (2024-07-01T15:11:37Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
MELA: Multilingual Evaluation of Linguistic Acceptability [7.524375463656369]
We present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability -- MELA, with 46K samples covering 10 languages. In pursuit of multilingual interpretability, we conduct probing experiments with fine-tuned XLM-R. Cross-lingual transfer experiments show that transfer in acceptability judgment is non-trivial.
arXiv Detail & Related papers (2023-11-15T15:25:28Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language. We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models. Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.