Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models
- URL: http://arxiv.org/abs/2504.04264v1
- Date: Sat, 05 Apr 2025 19:43:10 GMT
- Title: Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models
- Authors: Mingyang Wang, Heike Adel, Lukas Lange, Yihong Liu, Ercong Nie, Jannik Strötgen, Hinrich Schütze,
- Abstract summary: We find that Multilingual factual models struggle to provide consistent responses to semantically equivalent prompts in different languages.<n>We propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency.
- Score: 49.16690802656554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual language models (MLMs) store factual knowledge across languages but often struggle to provide consistent responses to semantically equivalent prompts in different languages. While previous studies point out this cross-lingual inconsistency issue, the underlying causes remain unexplored. In this work, we use mechanistic interpretability methods to investigate cross-lingual inconsistencies in MLMs. We find that MLMs encode knowledge in a language-independent concept space through most layers, and only transition to language-specific spaces in the final layers. Failures during the language transition often result in incorrect predictions in the target language, even when the answers are correct in other languages. To mitigate this inconsistency issue, we propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency. Our findings shed light on the internal mechanisms of MLMs and provide a lightweight, effective strategy for producing more consistent factual outputs.
Related papers
- How Do Multilingual Language Models Remember Facts? [50.13632788453612]
We show that previously identified recall mechanisms in English largely apply to multilingual contexts.
We localize the role of language during recall, finding that subject enrichment is language-independent.
In decoder-only LLMs, FVs compose these two pieces of information in two separate stages.
arXiv Detail & Related papers (2024-10-18T11:39:34Z) - NatLan: Native Language Prompting Facilitates Knowledge Elicitation Through Language Trigger Provision and Domain Trigger Retention [4.051397588437236]
We analogize the dominant language of MLLMs to the native language of humans and use two human cognitive features to interpret translate-then-answer methods.
We propose Native Language Prompting (NatLan), employing a Multi-MLLM collaboration strategy and introducing an additional role-enhanced domain-specific MLLM.
arXiv Detail & Related papers (2024-08-07T04:49:38Z) - Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Discovering Low-rank Subspaces for Language-agnostic Multilingual
Representations [38.56175462620892]
Large pretrained multilingual language models (ML-LMs) have shown remarkable capabilities of zero-shot cross-lingual transfer.
We present a novel view of projecting away language-specific factors from a multilingual embedding space.
We show that applying our method consistently leads to improvements over commonly used ML-LMs.
arXiv Detail & Related papers (2024-01-11T09:54:11Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.