MuSiQue: Multi-hop Questions via Single-hop Question Composition
- URL: http://arxiv.org/abs/2108.00573v1
- Date: Mon, 2 Aug 2021 00:33:27 GMT
- Title: MuSiQue: Multi-hop Questions via Single-hop Question Composition
- Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal
- Abstract summary: constructing multi-hop questions as composition of single-hop questions allows us to exercise greater control over the quality of the resulting multi-hop questions.
We use this process to construct a new multihop QA dataset: MuSiQue-Ans with 25K 2-4 hop questions using seed questions from 5 existing single-hop datasets.
- Score: 36.84063888323547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To build challenging multi-hop question answering datasets, we propose a
bottom-up semi-automatic process of constructing multi-hop question via
composition of single-hop questions. Constructing multi-hop questions as
composition of single-hop questions allows us to exercise greater control over
the quality of the resulting multi-hop questions. This process allows building
a dataset with (i) connected reasoning where each step needs the answer from a
previous step; (ii) minimal train-test leakage by eliminating even partial
overlap of reasoning steps; (iii) variable number of hops and composition
structures; and (iv) contrasting unanswerable questions by modifying the
context. We use this process to construct a new multihop QA dataset:
MuSiQue-Ans with ~25K 2-4 hop questions using seed questions from 5 existing
single-hop datasets. Our experiments demonstrate that MuSique is challenging
for state-of-the-art QA models (e.g., human-machine gap of $~$30 F1 pts),
significantly harder than existing datasets (2x human-machine gap), and
substantially less cheatable (e.g., a single-hop model is worse by 30 F1 pts).
We also build an even more challenging dataset, MuSiQue-Full, consisting of
answerable and unanswerable contrast question pairs, where model performance
drops further by 13+ F1 pts. For data and code, see
\url{https://github.com/stonybrooknlp/musique}.
Related papers
- MoreHopQA: More Than Multi-hop Reasoning [32.94332511203639]
We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers.
Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2WikiMultihopQA, and MuSiQue.
Our results show that models perform well on initial multi-hop questions but struggle with our extended questions.
arXiv Detail & Related papers (2024-06-19T09:38:59Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question
Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question.
Existing methods usually decompose the multi-hop question into simpler single-hop questions.
We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of
Reasoning Steps [31.472490306390977]
A multi-hop question answering dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question.
Previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question.
We present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data.
arXiv Detail & Related papers (2020-11-02T15:42:40Z) - Unsupervised Multi-hop Question Answering by Question Generation [108.61653629883753]
MQA-QG is an unsupervised framework that can generate human-like multi-hop training data.
Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance.
arXiv Detail & Related papers (2020-10-23T19:13:47Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Do Multi-Hop Question Answering Systems Know How to Answer the
Single-Hop Sub-Questions? [23.991872322492384]
We investigate whether top-performing models for multi-hop questions understand the underlying sub-questions like humans.
We show that multiple state-of-the-art multi-hop QA models fail to correctly answer a large portion of sub-questions.
Our work takes a step forward towards building a more explainable multi-hop QA system.
arXiv Detail & Related papers (2020-02-23T15:16:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.