MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop
Questions
- URL: http://arxiv.org/abs/2305.14795v2
- Date: Sun, 29 Oct 2023 20:28:17 GMT
- Title: MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop
Questions
- Authors: Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts,
Danqi Chen
- Abstract summary: We present a benchmark, MQuAKE, comprising multi-hop questions that assess whether edited models correctly answer questions.
We propose a memory-based approach, MeLLo, which stores all edited facts externally while prompting the language model iteratively to generate answers consistent with the edited facts.
- Score: 80.69639629733484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The information stored in large language models (LLMs) falls out of date
quickly, and retraining from scratch is often not an option. This has recently
given rise to a range of techniques for injecting new facts through updating
model weights. Current evaluation paradigms are extremely limited, mainly
validating the recall of edited facts, but changing one fact should cause
rippling changes to the model's related beliefs. If we edit the UK Prime
Minister to now be Rishi Sunak, then we should get a different answer to Who is
married to the British Prime Minister? In this work, we present a benchmark,
MQuAKE (Multi-hop Question Answering for Knowledge Editing), comprising
multi-hop questions that assess whether edited models correctly answer
questions where the answer should change as an entailed consequence of edited
facts. While we find that current knowledge-editing approaches can recall
edited facts accurately, they fail catastrophically on the constructed
multi-hop questions. We thus propose a simple memory-based approach, MeLLo,
which stores all edited facts externally while prompting the language model
iteratively to generate answers that are consistent with the edited facts.
While MQuAKE remains challenging, we show that MeLLo scales well with LLMs (up
to 175B) and outperforms previous model editors by a large margin.
Related papers
- Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place.
Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z) - Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge [93.54427119091174]
We propose outDated ISsue aware deCOding to enhance the performance of edited models on reasoning questions.
We capture the difference in the probability distribution between the original and edited models.
We amplify the difference of the token prediction in the edited model to alleviate the outdated issue.
arXiv Detail & Related papers (2024-06-05T03:00:15Z) - WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models [78.22291694903659]
Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses.
Where the updated knowledge resides in memories is a fundamental question for model editing.
We propose WISE to bridge the gap between memories.
arXiv Detail & Related papers (2024-05-23T16:35:52Z) - Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models [47.199078631274745]
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge updates.
We propose the Retrieval-Augmented model Editing (RAE) framework tailored for multi-hop question answering.
arXiv Detail & Related papers (2024-03-28T17:47:19Z) - PokeMQA: Programmable knowledge editing for Multi-hop Question Answering [46.80110170981976]
Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine's comprehension and reasoning abilities.
We propose a framework, Programmable knowledge editing for Multi-hop Question Answering (MQA)
Specifically, we prompt LLMs to decompose knowledge-augmented multi-hop question, while interacting with a detached trainable scope detector to modulate LLMs behavior depending on external conflict signal.
arXiv Detail & Related papers (2023-12-23T08:32:13Z) - Does Localization Inform Editing? Surprising Differences in
Causality-Based Localization vs. Knowledge Editing in Language Models [68.03946716358335]
We find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.
This is surprising because we would expect that localizing facts to specific model parameters would tell us where to manipulate knowledge in models.
Our results suggest, counterintuitively, that better mechanistic understanding of how pretrained language models work may not always translate to insights about how to best change their behavior.
arXiv Detail & Related papers (2023-01-10T21:26:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.