CLUE: Conflict-guided Localization for LLM Unlearning Framework
- URL: http://arxiv.org/abs/2509.20977v1
- Date: Thu, 25 Sep 2025 10:23:16 GMT
- Title: CLUE: Conflict-guided Localization for LLM Unlearning Framework
- Authors: Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang,
- Abstract summary: We propose a Conflict-guided localization for LLM Unlearning framEwork.<n>This framework identifies the forget and retain circuit composed of important neurons, and then the circuits are transformed into conjunctive normal forms.<n>Experiments demonstrate that CLUE achieves superior forget efficacy and retain utility through precise neural localization.
- Score: 35.90665719234101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The LLM unlearning aims to eliminate the influence of undesirable data without affecting causally unrelated information. This process typically involves using a forget set to remove target information, alongside a retain set to maintain non-target capabilities. While recent localization-based methods demonstrate promise in identifying important neurons to be unlearned, they fail to disentangle neurons responsible for forgetting undesirable knowledge or retaining essential skills, often treating them as a single entangled group. As a result, these methods apply uniform interventions, risking catastrophic over-forgetting or incomplete erasure of the target knowledge. To address this, we turn to circuit discovery, a mechanistic interpretability technique, and propose the Conflict-guided Localization for LLM Unlearning framEwork (CLUE). This framework identifies the forget and retain circuit composed of important neurons, and then the circuits are transformed into conjunctive normal forms (CNF). The assignment of each neuron in the CNF satisfiability solution reveals whether it should be forgotten or retained. We then provide targeted fine-tuning strategies for different categories of neurons. Extensive experiments demonstrate that, compared to existing localization methods, CLUE achieves superior forget efficacy and retain utility through precise neural localization.
Related papers
- KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models [26.418820118903852]
Large language models (LLMs) acquire a large amount of knowledge through pre-training on vast and diverse corpora.<n>LLMs unlearning is a promising technique to reduce risks associated with sensitive, copyrighted, or harmful content in training data.<n>We propose Knowledge Unlearning by Deviating representAtion (KUDA) to achieve effective unlearning at the knowledge level of LLMs.
arXiv Detail & Related papers (2026-02-22T17:16:49Z) - Safety Alignment via Constrained Knowledge Unlearning [11.225354394106226]
We propose a novel safety alignment strategy, Constrained Knowledge Unlearning (CKU)<n>CKU focuses on two primary objectives: knowledge localization and retention, and unlearning harmful knowledge.<n> Experimental results demonstrate that CKU significantly enhances model safety without compromising overall performance.
arXiv Detail & Related papers (2025-05-24T08:29:50Z) - Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data [55.31265817705997]
We propose a conceptual space to characterize diverse corrupted data unlearning tasks in vision classifiers.<n>We propose a novel method, Redirection for Erasing Memory (REM), whose key feature is that corrupted data are redirected to dedicated neurons introduced at unlearning time.<n>REM performs strongly across the space of tasks, in contrast to prior SOTA methods that fail outside the regions for which they were designed.
arXiv Detail & Related papers (2025-05-23T10:47:27Z) - What should a neuron aim for? Designing local objective functions based on information theory [41.39714023784306]
We show how self-organized artificial neurons can be achieved by designing bio-inspired local learning goals.<n>These goals are parameterized using a recent extension of information theory, Partial Information Decomposition.<n>Our work advances a principled information-theoretic foundation for local learning strategies.
arXiv Detail & Related papers (2024-12-03T14:45:46Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Learnable Privacy Neurons Localization in Language Models [19.984172475680182]
We introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within Large Language Models (LLMs)
Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training.
We propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons.
arXiv Detail & Related papers (2024-05-16T08:11:08Z) - Dissecting Language Models: Machine Unlearning via Selective Pruning [0.7373617024876725]
This paper introduces a machine unlearning method specifically designed for Large Language Models (LLMs)
We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance.
Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others.
arXiv Detail & Related papers (2024-03-02T17:10:44Z) - Bio-Inspired, Task-Free Continual Learning through Activity
Regularization [3.5502600490147196]
Continual learning approaches usually require discrete task boundaries.
We take inspiration from neuroscience, where sparse, non-overlapping neuronal representations have been suggested to prevent forgetting.
In addition to sparsity, we introduce lateral recurrent connections within each layer to further protect previously learned representations.
Our method achieves similar performance to well-known CL methods, such as Elastic Weight Consolidation and Synaptic Intelligence, without requiring information about task boundaries.
arXiv Detail & Related papers (2022-12-08T15:14:20Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.