Related papers: ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs

ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs

URL: http://arxiv.org/abs/2506.01386v2
Date: Sat, 06 Sep 2025 00:54:52 GMT
Title: ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs
Authors: Manit Baser, Dinil Mon Divakaran, Mohan Gurusamy,
Abstract summary: We present ThinkEval, a framework to quantify indirect knowledge leakage and ripple effects in model-editing.<n>ThinkEval builds and employs specialized knowledge graphs to analyze the causal structure of facts before and after editing.<n>We evaluate five editing techniques: AlphaEdit, RECT, ROME, MEMIT, and PRUNE.
Score: 3.9295613363026174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robust model-editing techniques are essential for deploying large language models (LLMs) in practical applications, to enable cost-effective ways to deal with challenges such as privacy breaches, bias mitigation and misinformation spread. For example, an LLM-based healthcare assistance may need to update out-dated or incorrect knowledge to prevent harmful recommendations. However, many editing techniques focus on isolated facts, which critically fail to prevent indirect knowledge leakage -- the unintended reconstruction of edited-out information through persistent causal links and contextual relationships. To assist users in selecting the right editing technique, we develop and present ThinkEval, a framework to systematically quantify indirect knowledge leakage and ripple effects in model-editing. ThinkEval builds and employs specialized knowledge graphs to analyze the causal structure of facts before and after editing. To support this approach, we present KnowGIC, a benchmark dataset comprising multi-step reasoning paths that precisely measure these complex knowledge transformation effects. We evaluate five editing techniques: AlphaEdit, RECT, ROME, MEMIT, and PRUNE across multiple LLMs. Our results show that these techniques struggle to balance indirect fact suppression with the preservation of related knowledge, compromising the contextual integrity of a model's knowledge. Our dataset is available at: https://anonymous.4open.science/r/KnowGIC.

Related papers

Are We Evaluating the Edit Locality of LLM Model Editing Properly? [68.441768731381]
We find that existing specificity evaluation protocols are inadequate for this purpose.<n>Existing specificity metrics are weakly correlated with the strength of specificity regularizers.<n>We also find that current metrics lack sufficient sensitivity, rendering them ineffective at distinguishing the specificity performance of different methods.
arXiv Detail & Related papers (2026-01-24T07:07:21Z)
Retention analysis of edited knowledge after fine-tuning [5.1877231178075425]
Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior.<n>Model editing methods have emerged as efficient solutions for such updates, offering localized and precise knowledge modification at significantly lower computational cost than continual training.<n>However, the effect of fine-tuning on previously edited knowledge remains poorly understood.
arXiv Detail & Related papers (2025-07-14T15:51:19Z)
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing [72.8373875453882]
knowledge editing (KE) has emerged as a promising approach to update specific facts in Large Language Models (LLMs) without the need for full retraining.<n>We propose a novel framework called MedEditBench to rigorously evaluate the effectiveness of existing KE methods in the medical domain.<n>Our findings indicate that current KE methods result in only superficial memorization of the injected information, failing to generalize to new scenarios.
arXiv Detail & Related papers (2025-06-04T02:14:43Z)
Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning? [14.656572343761153]
editing and unlearning seem to be two distinct tasks, we find there is a tight connection between them.<n>We evaluate if knowledge editing techniques are strong baselines for LLM unlearning.<n>We propose practical recipes including self-improvement and query merging to better adapt editing methods for unlearning applications.
arXiv Detail & Related papers (2025-05-26T11:39:56Z)
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs [47.06544781855325]
We propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting success rates.<n>By precisely identifying and modifying specific neurons within feed-forward networks, FiNE significantly improves knowledge localization and editing.
arXiv Detail & Related papers (2025-03-03T01:30:28Z)
K-Edit: Language Model Editing with Contextual Knowledge Awareness [71.73747181407323]
Knowledge-based model editing enables precise modifications to the weights of large language models.<n>We present K-Edit, an effective approach to generating contextually consistent knowledge edits.
arXiv Detail & Related papers (2025-02-15T01:35:13Z)
AnyEdit: Edit Any Knowledge Encoded in Language Models [69.30638272162267]
We propose AnyEdit, a new autoregressive editing paradigm for large language models (LLMs)<n>It decomposes long-form knowledge into sequential chunks and iteratively edits the key token in each chunk, ensuring consistent and accurate outputs.<n>It outperforms strong baselines by 21.5% on benchmarks including UnKEBench, AKEW, and our new EditEverything dataset for long-form diverse-formatted knowledge.
arXiv Detail & Related papers (2025-02-08T16:18:37Z)
Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject [49.559994791305535]
Current state-of-the-art editing methods struggle when tasked with editing multiple related knowledge pieces for the same subject.<n>We introduce the $textS2textRKE$(Same-Subject Related Knowledge Editing) benchmark.<n>Our experiments reveal that only mainstream locate-then-edit methods, such as ROME and MEMIT, exhibit "related knowledge perturbation"
arXiv Detail & Related papers (2025-02-08T04:47:17Z)
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models [63.209935157623946]
Large language models (LLMs) often exhibit hallucinations due to incorrect or outdated knowledge.<n>We introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters.<n>We theoretically prove that this projection ensures the output of post-edited LLMs remains unchanged when queried about the preserved knowledge.
arXiv Detail & Related papers (2024-10-03T10:06:27Z)
How Well Can Knowledge Edit Methods Edit Perplexing Knowledge? [18.022428746019582]
Large language models (LLMs) have demonstrated remarkable capabilities, but updating their knowledge post-training remains a critical challenge.<n>We introduce the concept of perplexingness'': the degree to which new knowledge conflicts with an LLM's learned conceptual hierarchies and categorical relationships.<n>Our analysis reveals that edits involving more abstract concepts (hypernyms) generally exhibit higher perplexingness and are more resistant to modification than their specific counterparts (hyponyms)
arXiv Detail & Related papers (2024-06-25T03:41:02Z)
Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge [93.54427119091174]
We propose outDated ISsue aware deCOding to enhance the performance of edited models on reasoning questions. We capture the difference in the probability distribution between the original and edited models. We amplify the difference of the token prediction in the edited model to alleviate the outdated issue.
arXiv Detail & Related papers (2024-06-05T03:00:15Z)
Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models [26.516571783335824]
Recent studies have identified side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing a unified perspective on the challenges of knowledge editing in large language models.
arXiv Detail & Related papers (2024-06-03T15:28:21Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
AKEW: Assessing Knowledge Editing in the Wild [79.96813982502952]
AKEW (Assessing Knowledge Editing in the Wild) is a new practical benchmark for knowledge editing. It fully covers three editing settings of knowledge updates: structured facts, unstructured texts as facts, and extracted triplets. Through extensive experiments, we demonstrate the considerable gap between state-of-the-art knowledge-editing methods and practical scenarios.
arXiv Detail & Related papers (2024-02-29T07:08:34Z)
Knowledge Graph Enhanced Large Language Model Editing [37.6721061644483]
Large language models (LLMs) are pivotal in advancing natural language processing (NLP) tasks. Existing editing methods struggle to track and incorporate changes in knowledge associated with edits. We propose a novel model editing method that leverages knowledge graphs for enhancing LLM editing, namely GLAME.
arXiv Detail & Related papers (2024-02-21T07:52:26Z)
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks [36.292901021210575]
We introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. All model editing methods show notably low performance on this dataset, especially in certain reasoning schemes.
arXiv Detail & Related papers (2024-01-31T04:12:59Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
Unveiling the Pitfalls of Knowledge Editing for Large Language Models [41.83423510576848]
It is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for Large Language Models. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences.
arXiv Detail & Related papers (2023-10-03T15:10:46Z)
Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs [54.22416829200613]
Eva-KELLM is a new benchmark for evaluating knowledge editing of large language models. Experimental results indicate that the current methods for knowledge editing using raw documents are not effective in yielding satisfactory results.
arXiv Detail & Related papers (2023-08-19T09:17:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.