AKEW: Assessing Knowledge Editing in the Wild
- URL: http://arxiv.org/abs/2402.18909v2
- Date: Thu, 10 Oct 2024 05:30:03 GMT
- Title: AKEW: Assessing Knowledge Editing in the Wild
- Authors: Xiaobao Wu, Liangming Pan, William Yang Wang, Anh Tuan Luu,
- Abstract summary: AKEW (Assessing Knowledge Editing in the Wild) is a new practical benchmark for knowledge editing.
It fully covers three editing settings of knowledge updates: structured facts, unstructured texts as facts, and extracted triplets.
Through extensive experiments, we demonstrate the considerable gap between state-of-the-art knowledge-editing methods and practical scenarios.
- Score: 79.96813982502952
- License:
- Abstract: Knowledge editing injects knowledge updates into language models to keep them correct and up-to-date. However, its current evaluations deviate significantly from practice: their knowledge updates solely consist of structured facts derived from meticulously crafted datasets, instead of practical sources -- unstructured texts like news articles, and they often overlook practical real-world knowledge updates. To address these issues, in this paper we propose AKEW (Assessing Knowledge Editing in the Wild), a new practical benchmark for knowledge editing. AKEW fully covers three editing settings of knowledge updates: structured facts, unstructured texts as facts, and extracted triplets. It further introduces new datasets featuring both counterfactual and real-world knowledge updates. Through extensive experiments, we demonstrate the considerable gap between state-of-the-art knowledge-editing methods and practical scenarios. Our analyses further highlight key insights to motivate future research for practical knowledge editing.
Related papers
- Event-level Knowledge Editing [53.767465515537545]
Existing work edits large language models (LLMs) at the level of factual knowledge triplets.
We propose a new task setting: event-level knowledge editing, which directly edits new events into LLMs.
We construct a high-quality event-level editing benchmark ELKEN, consisting of 1,515 event edits, 6,449 questions about factual knowledge, and 10,150 questions about future tendencies.
arXiv Detail & Related papers (2024-02-20T15:36:41Z) - Stable Knowledge Editing in Large Language Models [68.98582618305679]
We introduce StableKE, a knowledge editing method based on knowledge augmentation rather than knowledge localization.
To overcome the expense of human labeling, StableKE integrates two automated knowledge augmentation strategies.
StableKE surpasses other knowledge editing methods, demonstrating stability both edited knowledge and multi-hop knowledge.
arXiv Detail & Related papers (2024-02-20T14:36:23Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - History Matters: Temporal Knowledge Editing in Large Language Model [42.74144542674756]
We introduce the task of Temporal Knowledge Editing (TKE) and establish a benchmark AToKe to evaluate current model editing methods.
We find that while existing model editing methods are effective at making models remember new knowledge, the edited model catastrophically forgets historical knowledge.
To address this gap, we propose a simple and general framework termed Multi-Editing with Time Objective (METO) for enhancing existing editing models.
arXiv Detail & Related papers (2023-12-09T07:51:56Z) - Assessing Knowledge Editing in Language Models via Relation Perspective [21.64869056276927]
This paper constructs a new benchmark named RaKE, which focuses on relation-based knowledge editing.
We establish a suite of innovative metrics for evaluation and conduct comprehensive experiments involving various knowledge editing baselines.
Our research results confirm that knowledge related to relations is not only stored in the FFN network but also in the attention layers.
arXiv Detail & Related papers (2023-11-15T15:44:42Z) - Beyond Factuality: A Comprehensive Evaluation of Large Language Models
as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks.
However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge.
We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z) - Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs [54.22416829200613]
Eva-KELLM is a new benchmark for evaluating knowledge editing of large language models.
Experimental results indicate that the current methods for knowledge editing using raw documents are not effective in yielding satisfactory results.
arXiv Detail & Related papers (2023-08-19T09:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.