Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs
- URL: http://arxiv.org/abs/2510.23949v1
- Date: Tue, 28 Oct 2025 00:05:00 GMT
- Title: Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs
- Authors: Kyomin Hwang, Hyeonjin Kim, Seungyeon Kim, Sunghyun Wee, Nojun Kwak,
- Abstract summary: We introduce N-gram-based Language-Mix (N-Mix) score to quantitatively show the language confusion is pervasive and consistent in multilingual LLMs.<n>We show that reference-based metrics result in false negatives when N-Mix score is high, and suggest the need of new type of unlearning evaluation.
- Score: 29.69282972994522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There have been a couple of studies showing that attempting to erase multilingual knowledge using only English data is insufficient for multilingual LLMs. However, their analyses remain highly performance-oriented. In this paper, we switch the point of view to evaluation, and address an additional blind spot which reveals itself when the multilingual LLM is fully finetuned with parallel multilingual dataset before unlearning. Here, language confusion occurs whereby a model responds in language different from that of the input prompt. Language confusion is a problematic phenomenon in unlearning, causing the standard reference-based metrics to fail. We tackle this phenomenon in three steps: (1) introduce N-gram-based Language-Mix (N-Mix) score to quantitatively show the language confusion is pervasive and consistent in multilingual LLMs, (2) demonstrate that reference-based metrics result in false negatives when N-Mix score is high, and(3) suggest the need of new type of unlearning evaluation that can directly assess the content of the generated sentences. We call this type of metrics as semantic-based metric.
Related papers
- Evaluating Cross-Lingual Unlearning in Multilingual Language Models [7.530890774798437]
Subspace-projection achieves strong cross-lingual forgetting with minimal degradation.<n>We show that multilingual forgetting depends on geometry in weight space, motivating subspace-based approaches for future unlearning systems.
arXiv Detail & Related papers (2026-01-10T20:27:32Z) - CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention [9.76878200328024]
Large Language Models (LLMs) often exhibit knowledge disparities across languages.<n>We introduce textitCausalAbstain, a method that helps LLMs determine whether to utilize multiple generated feedback responses.<n>Experiments demonstrate that textitCausalAbstain effectively selects helpful feedback and enhances abstention decisions with interpretability.
arXiv Detail & Related papers (2025-05-31T11:35:31Z) - Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z) - Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z) - Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models [49.16690802656554]
We find that Multilingual factual models struggle to provide consistent responses to semantically equivalent prompts in different languages.<n>We propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency.
arXiv Detail & Related papers (2025-04-05T19:43:10Z) - Guardians of Discourse: Evaluating LLMs on Multilingual Offensive Language Detection [10.129235204880443]
We evaluate the impact of different prompt languages and augmented translation data for the task in non-English contexts.
We discuss the impact of the inherent bias in LLMs and the datasets in the mispredictions related to sensitive topics.
arXiv Detail & Related papers (2024-10-21T04:08:16Z) - Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - MELA: Multilingual Evaluation of Linguistic Acceptability [7.524375463656369]
We present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability -- MELA, with 46K samples covering 10 languages.
In pursuit of multilingual interpretability, we conduct probing experiments with fine-tuned XLM-R.
Cross-lingual transfer experiments show that transfer in acceptability judgment is non-trivial.
arXiv Detail & Related papers (2023-11-15T15:25:28Z) - Language models are not naysayers: An analysis of language models on
negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.