Related papers: Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

URL: http://arxiv.org/abs/2011.01060v2
Date: Thu, 12 Nov 2020 07:47:48 GMT
Title: Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Authors: Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara and Akiko Aizawa
Abstract summary: A multi-hop question answering dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. Previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. We present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data.
Score: 31.472490306390977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.

Related papers

MoreHopQA: More Than Multi-hop Reasoning [32.94332511203639]
We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers. Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2WikiMultihopQA, and MuSiQue. Our results show that models perform well on initial multi-hop questions but struggle with our extended questions.
arXiv Detail & Related papers (2024-06-19T09:38:59Z)
Explainable Multi-hop Question Generation: An End-to-End Approach without Intermediate Question Labeling [6.635572580071933]
Multi-hop question generation aims to generate complex questions that requires multi-step reasoning over several documents. Previous studies have predominantly utilized end-to-end models, wherein questions are decoded based on the representation of context documents. This paper introduces an end-to-end question rewriting model that increases question complexity through sequential rewriting.
arXiv Detail & Related papers (2024-03-31T06:03:54Z)
How Well Do Multi-hop Reading Comprehension Models Understand Date Information? [31.243088887839257]
The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear. It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
arXiv Detail & Related papers (2022-10-11T07:24:07Z)
Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions. We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains. When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z)
Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question. Existing methods usually decompose the multi-hop question into simpler single-hop questions. We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z)
Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation. PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions. Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z)
MuSiQue: Multi-hop Questions via Single-hop Question Composition [36.84063888323547]
constructing multi-hop questions as composition of single-hop questions allows us to exercise greater control over the quality of the resulting multi-hop questions. We use this process to construct a new multihop QA dataset: MuSiQue-Ans with 25K 2-4 hop questions using seed questions from 5 existing single-hop datasets.
arXiv Detail & Related papers (2021-08-02T00:33:27Z)
Unsupervised Multi-hop Question Answering by Question Generation [108.61653629883753]
MQA-QG is an unsupervised framework that can generate human-like multi-hop training data. Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance.
arXiv Detail & Related papers (2020-10-23T19:13:47Z)
Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs. We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops. Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z)
Answering Any-hop Open-domain Questions with Iterative Document Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions. Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.