Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs
- URL: http://arxiv.org/abs/2308.09954v1
- Date: Sat, 19 Aug 2023 09:17:19 GMT
- Title: Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs
- Authors: Suhang Wu, Minlong Peng, Yue Chen, Jinsong Su, Mingming Sun
- Abstract summary: Eva-KELLM is a new benchmark for evaluating knowledge editing of large language models.
Experimental results indicate that the current methods for knowledge editing using raw documents are not effective in yielding satisfactory results.
- Score: 54.22416829200613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) possess a wealth of knowledge encoded in their
parameters. However, this knowledge may become outdated or unsuitable over
time. As a result, there has been a growing interest in knowledge editing for
LLMs and evaluating its effectiveness. Existing studies primarily focus on
knowledge editing using factual triplets, which not only incur high costs for
collection but also struggle to express complex facts. Furthermore, these
studies are often limited in their evaluation perspectives. In this paper, we
propose Eva-KELLM, a new benchmark for evaluating knowledge editing of LLMs.
This benchmark includes an evaluation framework and a corresponding dataset.
Under our framework, we first ask the LLM to perform knowledge editing using
raw documents, which provides a more convenient and universal approach compared
to using factual triplets. We then evaluate the updated LLM from multiple
perspectives. In addition to assessing the effectiveness of knowledge editing
and the retention of unrelated knowledge from conventional studies, we further
test the LLM's ability in two aspects: 1) Reasoning with the altered knowledge,
aiming for the LLM to genuinely learn the altered knowledge instead of simply
memorizing it. 2) Cross-lingual knowledge transfer, where the LLM updated with
raw documents in one language should be capable of handling queries from
another language. To facilitate further research, we construct and release the
corresponding dataset. Using this benchmark, we investigate the effectiveness
of several commonly-used knowledge editing methods. Experimental results
indicate that the current methods for knowledge editing using raw documents are
not effective in yielding satisfactory results, particularly when it comes to
reasoning with altered knowledge and cross-lingual knowledge transfer.
Related papers
- Cross-Lingual Multi-Hop Knowledge Editing -- Benchmarks, Analysis and a Simple Contrastive Learning based Approach [53.028586843468915]
We propose the Cross-Lingual Multi-Hop Knowledge Editing paradigm, for measuring and analyzing the performance of various SoTA knowledge editing techniques in a cross-lingual setup.
Specifically, we create a parallel cross-lingual benchmark, CROLIN-MQUAKE for measuring the knowledge editing capabilities.
Following this, we propose a significantly improved system for cross-lingual multi-hop knowledge editing, CLEVER-CKE.
arXiv Detail & Related papers (2024-07-14T17:18:16Z) - Editing Conceptual Knowledge for Large Language Models [65.38231526537476]
This paper pioneers the investigation of editing conceptual knowledge for Large Language Models (LLMs)
We construct a novel benchmark dataset ConceptEdit and establish a suite of new metrics for evaluation.
experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge.
arXiv Detail & Related papers (2024-03-10T16:57:10Z) - Learning to Edit: Aligning LLMs with Knowledge Editing [101.96620267293731]
We propose a Learning to Edit (LTE) framework, focusing on teaching large language models to apply updated knowledge into input questions.
LTE features a two-phase process: (i) the Alignment Phase, which fine-tunes LLMs on a meticulously curated parallel dataset to make reliable, in-scope edits.
We demonstrate LTE's superiority in knowledge editing performance, robustness in both batch and sequential editing, minimal interference on general tasks, and rapid editing speeds.
arXiv Detail & Related papers (2024-02-19T07:45:17Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Unveiling the Pitfalls of Knowledge Editing for Large Language Models [41.83423510576848]
It is still unclear whether knowledge editing might introduce side effects that pose potential risks or not.
This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for Large Language Models.
Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences.
arXiv Detail & Related papers (2023-10-03T15:10:46Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.