Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
- URL: http://arxiv.org/abs/2405.15349v2
- Date: Fri, 18 Oct 2024 04:32:49 GMT
- Title: Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
- Authors: Jingcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng,
- Abstract summary: A significant portion of real-world knowledge is stored in an unstructured format.
Techniques like local layer key-value storage and term-driven optimization are not effective for handling unstructured knowledge.
We propose a novel Unstructured Knowledge Editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension.
- Score: 65.10456412127405
- License:
- Abstract: Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by long-form content, noise, and a complex yet comprehensive nature. Techniques like local layer key-value storage and term-driven optimization, as used in previous methods like MEMIT, are not effective for handling unstructured knowledge. To address these challenges, we propose a novel Unstructured Knowledge Editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension. Firstly, in the layer dimension, we propose non-local block key-value storage to replace local layer key-value storage, increasing the representation ability of key-value pairs and incorporating attention layer knowledge. Secondly, in the token dimension, we replace term-driven optimization with cause-driven optimization, which edits the last token directly while preserving context, avoiding the need to locate terms and preventing the loss of context information. Results on newly proposed unstructured knowledge editing dataset (UnKEBench) and traditional structured datasets demonstrate that UnKE achieves remarkable performance, surpassing strong baselines. In addition, UnKE has robust batch editing and sequential editing capabilities.
Related papers
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models [41.45831411548188]
StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.
Results show that StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.
arXiv Detail & Related papers (2024-09-16T09:48:56Z) - Structure-aware Domain Knowledge Injection for Large Language Models [37.089378357827826]
This paper introduces a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists.
It significantly reduces the training corpus requirement to a mere 0.3%, while achieving an impressive 50% of traditional knowledge injection performance.
Our method demonstrates the potential of comparable improvement against the state-of-the-art MMedLM2 on MMedBench, while significantly reducing the training costs to 5%.
arXiv Detail & Related papers (2024-07-23T12:38:48Z) - Detecting Edited Knowledge in Language Models [5.260519479124422]
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training.
Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models.
We propose a novel task: detecting edited knowledge in language models.
arXiv Detail & Related papers (2024-05-04T22:02:24Z) - Stable Knowledge Editing in Large Language Models [68.98582618305679]
We introduce StableKE, a knowledge editing method based on knowledge augmentation rather than knowledge localization.
To overcome the expense of human labeling, StableKE integrates two automated knowledge augmentation strategies.
StableKE surpasses other knowledge editing methods, demonstrating stability both edited knowledge and multi-hop knowledge.
arXiv Detail & Related papers (2024-02-20T14:36:23Z) - EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries [69.72012539060731]
We introduce a theoretical framework for efficient knowledge editing (KE) in large language models (LLMs)
We propose a novel task of event-based knowledge editing that pairs facts with event descriptions.
We empirically demonstrate the superiority of event-based editing over the existing setting on resolving uncertainty in edited models.
arXiv Detail & Related papers (2024-02-17T16:34:50Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models (LLMs)
We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.
We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the textscCounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.