Related papers: Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models

Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models

URL: http://arxiv.org/abs/2407.01358v1
Date: Mon, 1 Jul 2024 15:11:37 GMT
Title: Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models
Authors: Xiaolin Xing, Zhiwei He, Haoyu Xu, Xing Wang, Rui Wang, Yu Hong,
Abstract summary: Large Language Models (LLMs) have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.
Score: 16.942897938964638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.To address these questions, we propose an innovative evaluation method for Cross-lingual Semantic Consistency (xSC) using the LaBSE model. We further introduce metrics for Cross-lingual Accuracy Consistency (xAC) and Cross-lingual Timeliness Consistency (xTC) to comprehensively assess the models' performance regarding semantic, accuracy, and timeliness inconsistencies. By harmonizing these metrics, we provide a holistic measurement of LLMs' cross-lingual consistency. Our findings aim to enhance the understanding and improvement of multilingual capabilities and interpretability in LLMs, contributing to the development of more robust and reliable multilingual language models.

Related papers

Optimizing Language Models for Crosslingual Knowledge Consistency [90.86445137816942]
Large language models are known to often exhibit inconsistent knowledge.<n>This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages.<n>In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function.
arXiv Detail & Related papers (2026-03-04T23:36:55Z)
On the Entity-Level Alignment in Crosslingual Consistency [62.33186691736433]
SubSub and SubInj integrate English translations of subjects into prompts across languages, leading to substantial gains in factual recall accuracy and consistency.<n>These interventions reinforce the entity representation alignment in the conceptual space through model's internal pivot-language processing.
arXiv Detail & Related papers (2025-10-11T16:26:50Z)
A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages [48.68444770923683]
We present the first comprehensive study of multilingual Chain-of-Thought (CoT) reasoning.<n>We measure language compliance, answer accuracy, and answer consistency when LRMs are prompt-hacked to think in a target language.<n>We find that the quality and effectiveness of thinking traces vary substantially depending on the prompt language.
arXiv Detail & Related papers (2025-10-10T17:06:50Z)
Are Knowledge and Reference in Multilingual Language Models Cross-Lingually Consistent? [28.76156047784995]
Cross-lingual consistency should be considered to assess cross-lingual transferability.<n>Code-switching training and cross-lingual word alignment objectives show the most promising results.
arXiv Detail & Related papers (2025-07-17T06:55:15Z)
Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z)
Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models [2.3749120526936465]
Large language models (LLMs) resolve relative clause (RC) attachment ambiguities. We assess whether LLMs can achieve human-like interpretations amid the complexities of language. We evaluate models in English, Spanish, French, German, Japanese, and Korean.
arXiv Detail & Related papers (2025-03-04T19:56:56Z)
Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models [53.38288894305388]
Multilingual large language models (MLLMs) are able to leverage in-context learning (ICL) to achieve high performance by leveraging cross-lingual knowledge transfer without parameter updates. Three key factors influence multilingual ICL: (1) semantic similarity, (2) linguistic alignment, and (3) language-specific performance. We propose balanced multi-factor ICL (textbfBMF-ICL), a method that quantifies and optimally balances these factors for improved example selection.
arXiv Detail & Related papers (2025-02-17T06:56:33Z)
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering [42.92810049636768]
Large Language Models (LLMs) are pretrained on extensive multilingual corpora to acquire both language-specific cultural knowledge and general knowledge. We explore the Cross-Lingual Self-Aligning ability of Language Models (CALM) to align knowledge across languages. We employ direct preference optimization (DPO) to align the model's knowledge across different languages.
arXiv Detail & Related papers (2025-01-30T16:15:38Z)
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.<n>P-MMEval delivers consistent language coverage across various datasets and provides parallel samples.<n>We conduct extensive experiments on representative multilingual model series to compare performances across models and tasks.
arXiv Detail & Related papers (2024-11-14T01:29:36Z)
MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [23.384966485398184]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs) The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language. Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-10-16T11:46:55Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment [38.35458193262633]
English-centric models are usually suboptimal in other languages. We propose a novel approach called CrossIn, which utilizes a mixed composition of cross-lingual instruction tuning data.
arXiv Detail & Related papers (2024-04-18T06:20:50Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [23.384966485398184]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs) The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language. Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-02-21T08:20:06Z)
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models [2.6626950367610402]
We study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. We propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy.
arXiv Detail & Related papers (2023-10-16T13:19:17Z)
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process. We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm. We provide insights into what makes multilingual sequential learning particularly challenging. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.