Related papers: Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing

Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing

URL: http://arxiv.org/abs/2505.12636v1
Date: Mon, 19 May 2025 02:44:57 GMT
Title: Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
Authors: Jiakuan Xie, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao,
Abstract summary: This paper introduces the concept of "superficial editing" to describe this phenomenon.<n>Our comprehensive evaluation reveals that this issue presents a significant challenge to existing algorithms.
Score: 18.12933371693374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge editing, which aims to update the knowledge encoded in language models, can be deceptive. Despite the fact that many existing knowledge editing algorithms achieve near-perfect performance on conventional metrics, the models edited by them are still prone to generating original knowledge. This paper introduces the concept of "superficial editing" to describe this phenomenon. Our comprehensive evaluation reveals that this issue presents a significant challenge to existing algorithms. Through systematic investigation, we identify and validate two key factors contributing to this issue: (1) the residual stream at the last subject position in earlier layers and (2) specific attention modules in later layers. Notably, certain attention heads in later layers, along with specific left singular vectors in their output matrices, encapsulate the original knowledge and exhibit a causal relationship with superficial editing. Furthermore, we extend our analysis to the task of superficial unlearning, where we observe consistent patterns in the behavior of specific attention heads and their corresponding left singular vectors, thereby demonstrating the robustness and broader applicability of our methodology and conclusions. Our code is available here.

Related papers

Unveiling and Eliminating the Shortcut Learning for Locate-Then-Edit Knowledge Editing via Both Subject and Relation Awareness [15.781679300397562]
Knowledge editing aims to alternate the target knowledge predicted by large language models.<n>We propose a novel two-stage optimization process that balances the learning of the subject feature and the relation feature.
arXiv Detail & Related papers (2025-06-04T15:06:46Z)
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models [88.58758610679762]
We introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens.<n>We categorize editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural.<n>To support fine-grained evaluation, we propose a protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies.
arXiv Detail & Related papers (2025-05-22T14:08:59Z)
GeoEdit: Geometric Knowledge Editing for Large Language Models [52.37408324849593]
Regular updates are essential for maintaining up-to-date knowledge in large language models (LLMs)<n>We propose a novel framework called Geometric Knowledge Editing (GeoEdit)<n>GeoEdit distinguishes between neurons associated with new knowledge updates and those related to general knowledge perturbations.<n>For the remaining neurons, we integrate both old and new knowledge for aligned directions and apply a "forget-then-learn" editing strategy for opposite directions.
arXiv Detail & Related papers (2025-02-27T10:27:48Z)
Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject [49.559994791305535]
Current state-of-the-art editing methods struggle when tasked with editing multiple related knowledge pieces for the same subject.<n>We introduce the $textS2textRKE$(Same-Subject Related Knowledge Editing) benchmark.<n>Our experiments reveal that only mainstream locate-then-edit methods, such as ROME and MEMIT, exhibit "related knowledge perturbation"
arXiv Detail & Related papers (2025-02-08T04:47:17Z)
Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge [93.54427119091174]
We propose outDated ISsue aware deCOding to enhance the performance of edited models on reasoning questions. We capture the difference in the probability distribution between the original and edited models. We amplify the difference of the token prediction in the edited model to alleviate the outdated issue.
arXiv Detail & Related papers (2024-06-05T03:00:15Z)
Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models [26.516571783335824]
Recent studies have identified side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing a unified perspective on the challenges of knowledge editing in large language models.
arXiv Detail & Related papers (2024-06-03T15:28:21Z)
EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries [69.72012539060731]
We introduce a theoretical framework for efficient knowledge editing (KE) in large language models (LLMs) We propose a novel task of event-based knowledge editing that pairs facts with event descriptions. We empirically demonstrate the superiority of event-based editing over the existing setting on resolving uncertainty in edited models.
arXiv Detail & Related papers (2024-02-17T16:34:50Z)
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks [36.292901021210575]
We introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. All model editing methods show notably low performance on this dataset, especially in certain reasoning schemes.
arXiv Detail & Related papers (2024-01-31T04:12:59Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.