Related papers: LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

URL: http://arxiv.org/abs/2511.14774v1
Date: Mon, 03 Nov 2025 17:06:49 GMT
Title: LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs
Authors: Pei-Fu Guo, Yun-Da Tsai, Chun-Chia Hsu, Kai-Xin Chen, Ya-An Tsai, Kai-Wei Chang, Nanyun Peng, Mi-Yen Yeh, Shou-De Lin,
Abstract summary: We present LiveCLKTBench, an automated generation pipeline designed to isolate and measure cross-lingual knowledge transfer.<n>Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains.<n>The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages.
Score: 67.09110757873142
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating cross-lingual knowledge transfer in large language models is challenging, as correct answers in a target language may arise either from genuine transfer or from prior exposure during pre-training. We present LiveCLKTBench, an automated generation pipeline specifically designed to isolate and measure cross-lingual knowledge transfer. Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains, filters them based on temporal occurrence, and verifies them against the model's knowledge. The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages to evaluate transferability across linguistic boundaries. Using LiveCLKTBench, we evaluate several LLMs across five languages and observe that cross-lingual transfer is strongly influenced by linguistic distance and often asymmetric across language directions. While larger models improve transfer, the gains diminish with scale and vary across domains. These findings provide new insights into multilingual transfer and demonstrate the value of LiveCLKTBench as a reliable benchmark for future research.

Related papers

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics [56.145578792496714]
Large language models (LLMs) struggle with cross-lingual knowledge transfer.<n>We study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets.
arXiv Detail & Related papers (2025-08-14T18:44:13Z)
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer [40.3285891624575]
We present ECLeKTic, a multilingual closed-book QA dataset that Evaluates Cross-Lingual Knowledge Transfer.<n>We used the presence and absence of Wikipedia articles in 12 languages to detect pieces of information that were likely available during pre-training in one of the languages but not in the others.<n>We show that current SOTA models struggle to effectively share knowledge across languages, even if they can predict the answer for questions in the language in which the knowledge was acquired.
arXiv Detail & Related papers (2025-02-28T16:59:30Z)
Language Models' Factuality Depends on the Language of Inquiry [36.466186024957075]
We introduce a benchmark of 10,000 country-related facts across 13 languages.<n>We propose three novel metrics: Factual Recall Score, Knowledge Transferability Score, and Cross-Lingual Factual Knowledge Transferability Score.<n>Our results reveal fundamental weaknesses in today's state-of-the-art LMs.
arXiv Detail & Related papers (2025-02-25T08:27:18Z)
Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges [48.96952594416528]
Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability.<n>XTransplant framework enables models to harness the complementary strengths of both English and non-English resources by transplanting latent activations across languages.
arXiv Detail & Related papers (2024-12-17T09:05:30Z)
Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets [4.653113033432781]
Cross-lingual transfer capabilities of Multilingual Language Models (MLLMs) are investigated. Our research provides valuable insights into cross-lingual transfer and its implications for NLP applications.
arXiv Detail & Related papers (2024-03-29T08:47:15Z)
Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models [12.662039551306632]
We show that observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge. More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages.
arXiv Detail & Related papers (2024-02-03T09:41:52Z)
Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm. We provide insights into what makes multilingual sequential learning particularly challenging. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z)
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP. We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.