Related papers: MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

URL: http://arxiv.org/abs/2305.14795v3
Date: Mon, 9 Sep 2024 04:38:16 GMT
Title: MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Authors: Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen,
Abstract summary: We present a benchmark, MQuAKE, comprising multi-hop questions that assess whether edited models correctly answer questions. We propose a memory-based approach, MeLLo, which stores all edited facts externally while prompting the language model iteratively to generate answers consistent with the edited facts.
Score: 75.21713251369225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option. This has recently given rise to a range of techniques for injecting new facts through updating model weights. Current evaluation paradigms are extremely limited, mainly validating the recall of edited facts, but changing one fact should cause rippling changes to the model's related beliefs. If we edit the UK Prime Minister to now be Rishi Sunak, then we should get a different answer to Who is married to the British Prime Minister? In this work, we present a benchmark, MQuAKE (Multi-hop Question Answering for Knowledge Editing), comprising multi-hop questions that assess whether edited models correctly answer questions where the answer should change as an entailed consequence of edited facts. While we find that current knowledge-editing approaches can recall edited facts accurately, they fail catastrophically on the constructed multi-hop questions. We thus propose a simple memory-based approach, MeLLo, which stores all edited facts externally while prompting the language model iteratively to generate answers that are consistent with the edited facts. While MQuAKE remains challenging, we show that MeLLo scales well with LLMs (e.g., OpenAI GPT-3.5-turbo) and outperforms previous model editors by a large margin.

Related papers

MQA-KEAL: Multi-hop Question Answering under Knowledge Editing for Arabic Language [7.488965571323756]
We propose Multi-hop Questioning Answering under Knowledge Editing for Arabic Language (MQA-KEAL) MQA-KEAL stores knowledge edits as structured knowledge units in the external memory. We also contribute MQUAKE-AR (Arabic translation of English benchmark MQUAKE) as well as a new benchmark MQA-AEVAL for rigorous performance evaluation of MQA under KE for Arabic language.
arXiv Detail & Related papers (2024-09-18T18:40:02Z)
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments [35.3938477255058]
This paper introduces Graph Memory-based Editing for Large Language Models (GMeLLo) GMeLLo merges the explicit knowledge representation of Knowledge Graphs with the linguistic flexibility of Large Language Models. Our results show that GMeLLo significantly surpasses current state-of-the-art knowledge editing methods in the multi-hop question answering benchmark, MQuAKE.
arXiv Detail & Related papers (2024-08-28T16:15:45Z)
Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing [38.590823330865845]
Large language models (LLMs) face challenges with internal knowledge inaccuracies and outdated information. Knowledge editing has emerged as a pivotal approach to mitigate these issues. We propose a novel knowledge editing method that incorporates a Knowledge Erasure mechanism for Large language model Editing (KELE)
arXiv Detail & Related papers (2024-08-22T14:53:33Z)
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z)
Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge [93.54427119091174]
We propose outDated ISsue aware deCOding to enhance the performance of edited models on reasoning questions. We capture the difference in the probability distribution between the original and edited models. We amplify the difference of the token prediction in the edited model to alleviate the outdated issue.
arXiv Detail & Related papers (2024-06-05T03:00:15Z)
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models [78.22291694903659]
Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses. Where the updated knowledge resides in memories is a fundamental question for model editing. We propose WISE to bridge the gap between memories.
arXiv Detail & Related papers (2024-05-23T16:35:52Z)
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering [47.199078631274745]
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge. We propose the Retrieval-Augmented model Editing (RAE) framework for multi-hop question answering.
arXiv Detail & Related papers (2024-03-28T17:47:19Z)
PokeMQA: Programmable knowledge editing for Multi-hop Question Answering [46.80110170981976]
Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine's comprehension and reasoning abilities. We propose a framework, Programmable knowledge editing for Multi-hop Question Answering (MQA) Specifically, we prompt LLMs to decompose knowledge-augmented multi-hop question, while interacting with a detached trainable scope detector to modulate LLMs behavior depending on external conflict signal.
arXiv Detail & Related papers (2023-12-23T08:32:13Z)
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models [68.03946716358335]
We find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored. This is surprising because we would expect that localizing facts to specific model parameters would tell us where to manipulate knowledge in models. Our results suggest, counterintuitively, that better mechanistic understanding of how pretrained language models work may not always translate to insights about how to best change their behavior.
arXiv Detail & Related papers (2023-01-10T21:26:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.