Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of
Reasoning Steps
- URL: http://arxiv.org/abs/2011.01060v2
- Date: Thu, 12 Nov 2020 07:47:48 GMT
- Title: Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of
Reasoning Steps
- Authors: Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara and Akiko Aizawa
- Abstract summary: A multi-hop question answering dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question.
Previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question.
We present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data.
- Score: 31.472490306390977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A multi-hop question answering (QA) dataset aims to test reasoning and
inference skills by requiring a model to read multiple paragraphs to answer a
given question. However, current datasets do not provide a complete explanation
for the reasoning process from the question to the answer. Further, previous
studies revealed that many examples in existing multi-hop datasets do not
require multi-hop reasoning to answer a question. In this study, we present a
new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and
unstructured data. In our dataset, we introduce the evidence information
containing a reasoning path for multi-hop questions. The evidence information
has two benefits: (i) providing a comprehensive explanation for predictions and
(ii) evaluating the reasoning skills of a model. We carefully design a pipeline
and a set of templates when generating a question-answer pair that guarantees
the multi-hop steps and the quality of the questions. We also exploit the
structured format in Wikidata and use logical rules to create questions that
are natural but still require multi-hop reasoning. Through experiments, we
demonstrate that our dataset is challenging for multi-hop models and it ensures
that multi-hop reasoning is required.
Related papers
- MoreHopQA: More Than Multi-hop Reasoning [32.94332511203639]
We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers.
Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2WikiMultihopQA, and MuSiQue.
Our results show that models perform well on initial multi-hop questions but struggle with our extended questions.
arXiv Detail & Related papers (2024-06-19T09:38:59Z) - Explainable Multi-hop Question Generation: An End-to-End Approach without Intermediate Question Labeling [6.635572580071933]
Multi-hop question generation aims to generate complex questions that requires multi-step reasoning over several documents.
Previous studies have predominantly utilized end-to-end models, wherein questions are decoded based on the representation of context documents.
This paper introduces an end-to-end question rewriting model that increases question complexity through sequential rewriting.
arXiv Detail & Related papers (2024-03-31T06:03:54Z) - How Well Do Multi-hop Reading Comprehension Models Understand Date
Information? [31.243088887839257]
The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear.
It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
arXiv Detail & Related papers (2022-10-11T07:24:07Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question
Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question.
Existing methods usually decompose the multi-hop question into simpler single-hop questions.
We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - MuSiQue: Multi-hop Questions via Single-hop Question Composition [36.84063888323547]
constructing multi-hop questions as composition of single-hop questions allows us to exercise greater control over the quality of the resulting multi-hop questions.
We use this process to construct a new multihop QA dataset: MuSiQue-Ans with 25K 2-4 hop questions using seed questions from 5 existing single-hop datasets.
arXiv Detail & Related papers (2021-08-02T00:33:27Z) - Unsupervised Multi-hop Question Answering by Question Generation [108.61653629883753]
MQA-QG is an unsupervised framework that can generate human-like multi-hop training data.
Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance.
arXiv Detail & Related papers (2020-10-23T19:13:47Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.