Related papers: Detecting Edited Knowledge in Language Models

Detecting Edited Knowledge in Language Models

URL: http://arxiv.org/abs/2405.02765v2
Date: Mon, 1 Jul 2024 19:20:58 GMT
Title: Detecting Edited Knowledge in Language Models
Authors: Paul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert,
Abstract summary: Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models. We propose a novel task: detecting edited knowledge in language models.
Score: 5.260519479124422
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting edited knowledge in language models. Given an edited model and a fact retrieved by a prompt from an edited model, the objective is to classify the knowledge as either unedited (based on the pre-training), or edited (based on subsequent editing). We instantiate the task with four KEs, two LLMs, and two datasets. Additionally, we propose using the hidden state representations and the probability distributions as features for the detection. Our results reveal that, using these features as inputs to a simple AdaBoost classifiers establishes a strong baseline. This classifier requires only a limited amount of data and maintains its performance even in cross-domain settings. Last, we find it more challenging to distinguish edited knowledge from unedited but related knowledge, highlighting the need for further research. Our work lays the groundwork for addressing malicious model editing, which is a critical challenge associated with the strong generative capabilities of LLMs.

Related papers

Latent Knowledge Scalpel: Precise and Massive Knowledge Editing for Large Language Models [3.834827405473377]
Large Language Models (LLMs) often retain inaccurate or outdated information from pre-training, leading to incorrect predictions or biased outputs during inference.<n>We introduce the Latent Knowledge Scalpel (LKS), an LLM editor that manipulates the latent knowledge of specific entities via a lightweight hypernetwork to enable precise and large-scale editing.<n> Experiments conducted on Llama-2 and Mistral show even with the number of simultaneous edits reaching 10,000, LKS effectively performs knowledge editing while preserving the general abilities of the edited LLMs.
arXiv Detail & Related papers (2025-08-01T03:51:43Z)
K-Edit: Language Model Editing with Contextual Knowledge Awareness [71.73747181407323]
Knowledge-based model editing enables precise modifications to the weights of large language models. We present K-Edit, an effective approach to generating contextually consistent knowledge edits.
arXiv Detail & Related papers (2025-02-15T01:35:13Z)
AnyEdit: Edit Any Knowledge Encoded in Language Models [69.30638272162267]
We propose AnyEdit, a new autoregressive editing paradigm for large language models (LLMs) It decomposes long-form knowledge into sequential chunks and iteratively edits the key token in each chunk, ensuring consistent and accurate outputs. It outperforms strong baselines by 21.5% on benchmarks including UnKEBench, AKEW, and our new EditEverything dataset for long-form diverse-formatted knowledge.
arXiv Detail & Related papers (2025-02-08T16:18:37Z)
Identifying Knowledge Editing Types in Large Language Models [11.051687980330286]
Knowledge editing has emerged as an efficient technique for updating the knowledge of large language models (LLMs)<n>There is a lack of effective measures to prevent the malicious misuse of this technique, which could lead to harmful edits in LLMs.<n>We introduce a new task, $textbfK$nowledge $textbfE$diting $textbfT$ype $textbfI$dentification (KETI), aimed at identifying different types of edits in LLMs.
arXiv Detail & Related papers (2024-09-29T11:29:57Z)
How Well Can Knowledge Edit Methods Edit Perplexing Knowledge? [18.022428746019582]
This study investigates the capability of knowledge editing methods to incorporate new knowledge with varying degrees of "perplexingness" We find significant negative correlations between the "perplexingness" of the new knowledge and the edit efficacy across all 12 scenarios. Further exploration into the influence of knowledge hierarchy on editing outcomes indicates that knowledge positioned at higher hierarchical levels is more challenging to modify in some scenarios.
arXiv Detail & Related papers (2024-06-25T03:41:02Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
On the Robustness of Editing Large Language Models [57.477943944826904]
Large language models (LLMs) have played a pivotal role in building communicative AI, yet they encounter the challenge of efficient updates. This work seeks to understand the strengths and limitations of editing methods, facilitating practical applications of communicative AI.
arXiv Detail & Related papers (2024-02-08T17:06:45Z)
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models (LLMs) We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching. We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the textscCounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
Knowledge-Augmented Language Model Verification [68.6099592486075]
Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. We propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier. Our results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs.
arXiv Detail & Related papers (2023-10-19T15:40:00Z)
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)
Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge. Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training. We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.