PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
- URL: http://arxiv.org/abs/2312.15194v2
- Date: Thu, 15 Feb 2024 03:10:29 GMT
- Title: PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
- Authors: Hengrui Gu, Kaixiong Zhou, Xiaotian Han, Ninghao Liu, Ruobing Wang,
Xin Wang
- Abstract summary: Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine's comprehension and reasoning abilities.
We propose a framework, Programmable knowledge editing for Multi-hop Question Answering (MQA)
Specifically, we prompt LLMs to decompose knowledge-augmented multi-hop question, while interacting with a detached trainable scope detector to modulate LLMs behavior depending on external conflict signal.
- Score: 46.80110170981976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-hop question answering (MQA) is one of the challenging tasks to
evaluate machine's comprehension and reasoning abilities, where large language
models (LLMs) have widely achieved the human-comparable performance. Due to the
dynamics of knowledge facts in real world, knowledge editing has been explored
to update model with the up-to-date facts while avoiding expensive re-training
or fine-tuning. Starting from the edited fact, the updated model needs to
provide cascading changes in the chain of MQA. The previous art simply adopts a
mix-up prompt to instruct LLMs conducting multiple reasoning tasks
sequentially, including question decomposition, answer generation, and conflict
checking via comparing with edited facts. However, the coupling of these
functionally-diverse reasoning tasks inhibits LLMs' advantages in comprehending
and answering questions while disturbing them with the unskilled task of
conflict checking. We thus propose a framework, Programmable knowledge editing
for Multi-hop Question Answering (PokeMQA), to decouple the jobs. Specifically,
we prompt LLMs to decompose knowledge-augmented multi-hop question, while
interacting with a detached trainable scope detector to modulate LLMs behavior
depending on external conflict signal. The experiments on three LLM backbones
and two benchmark datasets validate our superiority in knowledge editing of
MQA, outperforming all competitors by a large margin in almost all settings and
consistently producing reliable reasoning process.
Related papers
- GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? [1.5776201492893507]
"GenSco" is a novel approach of selecting passages based on the predicted decomposition of the multi-hop questions.
We evaluate on three broadly established multi-hop question answering datasets.
arXiv Detail & Related papers (2024-07-14T15:25:08Z) - FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering [26.398873686905063]
Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks.
We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks.
arXiv Detail & Related papers (2024-07-03T10:01:01Z) - Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models [47.199078631274745]
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge updates.
We propose the Retrieval-Augmented model Editing (RAE) framework tailored for multi-hop question answering.
arXiv Detail & Related papers (2024-03-28T17:47:19Z) - Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers.
We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - Search-in-the-Chain: Interactively Enhancing Large Language Models with
Search for Knowledge-intensive Tasks [121.74957524305283]
This paper proposes a novel framework named textbfSearch-in-the-Chain (SearChain) for the interaction between Information Retrieval (IR) and Large Language Model (LLM)
Experiments show that SearChain outperforms state-of-the-art baselines on complex knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-28T10:15:25Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.