Consistency-Aware Editing for Entity-level Unlearning in Language Models
- URL: http://arxiv.org/abs/2601.08840v1
- Date: Fri, 19 Dec 2025 15:18:07 GMT
- Title: Consistency-Aware Editing for Entity-level Unlearning in Language Models
- Authors: Xiaoqi Han, Víctor Gutiérrez-Basulto, Ru Li, Xiaoli Li, Jiye Liang, Jeff Z. Pan,
- Abstract summary: We introduce a novel consistency-aware editing (CAE) framework for entity-level unlearning.<n>CAE aggregates a diverse set of prompts related to a target entity, including its attributes, relations, and adversarial paraphrases.<n>It then jointly learns a low-rank update guided by a consistency regularizer that aligns the editing directions across prompts.
- Score: 53.522931419965424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) risk retaining sensitive, copyrighted, or harmful information from their training data. Entity-level unlearning addresses this issue by removing all knowledge of a specific entity while preserving the model's overall capabilities. Existing approaches typically rely on full-model fine-tuning or prompt-based interventions, which can be computationally expensive or brittle when handling paraphrased queries. Recently, model editing has emerged as an efficient alternative for updating knowledge in LLMs, offering a promising direction for unlearning. However, existing editing techniques are typically designed for instance-level updates, modifying responses to specific attributes of an entity rather than eliminating all knowledge associated with the entity. In this paper, we investigate how editing techniques can be adapted for effective and efficient entity-level unlearning. To this end, we introduce a novel consistency-aware editing (CAE) framework. CAE aggregates a diverse set of prompts related to a target entity, including its attributes, relations, and adversarial paraphrases. It then jointly learns a low-rank update guided by a consistency regularizer that aligns the editing directions across prompts. This promotes robust and comprehensive forgetting while minimizing interference with unrelated knowledge. We further examine where different entities are stored within the model and how many diverse prompts are needed for successful unlearning. We evaluate CAE on two challenging benchmarks, RWKU and ToFU, and demonstrate that it (i) provides insights into how entity-level knowledge is internally represented and deleted in LLMs, (ii) significantly improves forgetting accuracy and robustness over traditional unlearning and editing baselines, and (iii) enables scalable entity removal using only tens of carefully selected prompts.
Related papers
- An Information-Theoretic Framework for Robust Large Language Model Editing [17.984683741974063]
Large Language Models (LLMs) have become indispensable tools in science, technology, and society.<n>Errors or outdated information within these models can undermine their accuracy and restrict their safe deployment.<n>We introduce a novel framework for editing LLMs, grounded in information bottleneck theory.<n>We present the Information Bottleneck Knowledge Editor (IBKE), which leverages compact latent representations to guide gradient-based updates.
arXiv Detail & Related papers (2025-12-18T06:21:17Z) - Representation Interventions Enable Lifelong Unstructured Knowledge Control [54.86207134539453]
Large language models (LLMs) often produce incorrect or outdated content. Updating their knowledge efficiently and accurately without costly retraining is a major challenge.<n>We introduce RILKE, a robust and scalable method that treats knowledge control as interventions within the model's representation space.<n>During training, RILKE learns paraphrase-robust and edit-localized modules that limit each update to a low-dimensional subspace to minimize cross-edit interference.<n>In inference, a query-adaptive router selects the appropriate module to guide the model's generation.
arXiv Detail & Related papers (2025-11-25T22:15:00Z) - SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models [96.81401797908835]
We introduce SAKE, the first benchmark specifically designed for editing auditory attribute knowledge in Large Audio-Language Models.<n>We benchmark seven editing methods on two LALMs along four dimensions: reliability, generality, audio/text locality, and portability.<n>Results highlight challenges such as preserving intra-attribute knowledge unrelated to the edit, generalizing edits to multimodal reasoning, and maintaining edits under sequential updates.
arXiv Detail & Related papers (2025-10-19T16:22:09Z) - InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing [86.17245523439514]
In-context learning is a promising editing method by comprehending edit information through context encoding.<n>This method is constrained by the limited context window of large language models.<n>We propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts.
arXiv Detail & Related papers (2025-05-28T09:20:18Z) - Disentangling Knowledge Representations for Large Language Model Editing [38.244171146682206]
We propose DiKE, a novel approach that Disentangles Knowledge representations for LLM Editing.<n>DiKE consists of two key components: a Knowledge Representation Disentanglement (KRD) module that decomposes the subject representation into target-knowledgerelated and -unrelated components, and a Knowledge Edit (DKE) module that updates only the target-related component while explicitly preserving the unrelated one.<n>To rigorously evaluate fine-grained irrelevant knowledge preservation, we construct FINE-KED, a new benchmark comprising fine-grained irrelevant knowledge at different levels of relational similarity to the edited knowledge.
arXiv Detail & Related papers (2025-05-24T16:24:04Z) - Knowledge Updating? No More Model Editing! Just Selective Contextual Reasoning [38.018263569983226]
We provide an evaluation of ten model editing methods along four dimensions: reliability, generalization, locality, and portability.<n>We then propose a straightforward method called Selective Contextual Reasoning (SCR) for knowledge updating.
arXiv Detail & Related papers (2025-03-07T08:04:25Z) - Has this Fact been Edited? Detecting Knowledge Edits in Language Models [5.260519479124422]
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training.<n>Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models.<n>We propose a novel task: detecting edited knowledge in language models.
arXiv Detail & Related papers (2024-05-04T22:02:24Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models.<n>We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.<n>We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the CounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.