Related papers: Layer-Targeted Multilingual Knowledge Erasure in Large Language Models

Layer-Targeted Multilingual Knowledge Erasure in Large Language Models

URL: http://arxiv.org/abs/2602.22562v1
Date: Thu, 26 Feb 2026 03:00:07 GMT
Title: Layer-Targeted Multilingual Knowledge Erasure in Large Language Models
Authors: Taoran Li, Varun Chandrasekaran, Zhiyuan Yu,
Abstract summary: We identify intervention depth as the key factor determining multilingual generalization.<n>We propose MUTE, a framework that uses Centered Kernel Alignment (CKA) and Linguistic Regions Development Score (LRDS) to identify intermediate, language-agnostic layers.
Score: 15.409568435026015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has demonstrated that machine unlearning in Large Language Models (LLMs) fails to generalize across languages: knowledge erased in one language frequently remains accessible through others. However, the underlying cause of this failure and a principled solution remain open. In this work, we identify intervention depth as the key factor determining multilingual generalization. Through systematic layer-wise experiments, we characterize two distinct failure modes: shallow-layer interventions achieve erasure but collapse multilingual capabilities in held-out languages, while deep-layer interventions preserve utility but fail to erase target knowledge even in source languages. These findings reveal that the choice of intervention layer is not a free parameter; it fundamentally determines whether multilingual unlearning succeeds. We propose MUTE (Multilingual Unlearning via Targeted Erasure), a framework that uses Centered Kernel Alignment (CKA) and Linguistic Regions Development Score (LRDS) to identify intermediate, language-agnostic layers where cross-lingual representations converge. By restricting unlearning updates to these layers, MUTE achieves robust multilingual knowledge erasure while optimizing on only a small set of source languages. Extensive experiments across three LLM architectures and three unlearning algorithms validate our approach, with mechanistic analysis via Logit Lens probing confirming genuine knowledge removal rather than output-level suppression.

Related papers

Evaluating Cross-Lingual Unlearning in Multilingual Language Models [7.530890774798437]
Subspace-projection achieves strong cross-lingual forgetting with minimal degradation.<n>We show that multilingual forgetting depends on geometry in weight space, motivating subspace-based approaches for future unlearning systems.
arXiv Detail & Related papers (2026-01-10T20:27:32Z)
CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention [9.76878200328024]
Large Language Models (LLMs) often exhibit knowledge disparities across languages.<n>We introduce textitCausalAbstain, a method that helps LLMs determine whether to utilize multiple generated feedback responses.<n>Experiments demonstrate that textitCausalAbstain effectively selects helpful feedback and enhances abstention decisions with interpretability.
arXiv Detail & Related papers (2025-05-31T11:35:31Z)
Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline [36.2731426595852]
We find that multilingual large language models (LLMs) exhibit significantly better performance in factual recall tasks in English than in other languages.<n>We identify two primary sources of error: insufficient engagement of the reliable English-centric mechanism for factual recall, and incorrect translation from English back into the target language.<n>Our interventions combined increase the recall accuracy by over 35 percent for the lowest-performing language.
arXiv Detail & Related papers (2025-05-26T22:20:45Z)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models [49.16690802656554]
We find that Multilingual factual models struggle to provide consistent responses to semantically equivalent prompts in different languages.<n>We propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency.
arXiv Detail & Related papers (2025-04-05T19:43:10Z)
How Do Multilingual Language Models Remember Facts? [50.13632788453612]
We show that previously identified recall mechanisms in English largely apply to multilingual contexts.<n>We localize the role of language during recall, finding that subject enrichment is language-independent.<n>In decoder-only LLMs, FVs compose these two pieces of information in two separate stages.
arXiv Detail & Related papers (2024-10-18T11:39:34Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
We propose Lens, a novel approach to enhance multilingual capabilities in large language models (LLMs)<n>Lens operates on two subspaces: the language-agnostic subspace, where it aligns target languages with the central language to inherit strong semantic representations, and the language-specific subspace, where it separates target and central languages to preserve linguistic specificity.<n>Lens significantly improves multilingual performance while maintaining the model's English proficiency, achieving better results with less computational cost compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
How do Large Language Models Handle Multilingualism? [81.15060972112563]
This study explores how large language models (LLMs) handle multilingualism. LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures.
arXiv Detail & Related papers (2024-02-29T02:55:26Z)
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations [38.56175462620892]
Large pretrained multilingual language models (ML-LMs) have shown remarkable capabilities of zero-shot cross-lingual transfer. We present a novel view of projecting away language-specific factors from a multilingual embedding space. We show that applying our method consistently leads to improvements over commonly used ML-LMs.
arXiv Detail & Related papers (2024-01-11T09:54:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.