Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected
Reasoning
- URL: http://arxiv.org/abs/2005.00789v3
- Date: Tue, 17 Nov 2020 04:18:51 GMT
- Title: Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected
Reasoning
- Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal
- Abstract summary: Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts.
We formalize such undesirable behavior as disconnected reasoning across subsets of supporting facts.
Experiments suggest that there hasn't been much progress in multi-hop QA in the reading comprehension setting.
- Score: 50.114651561111245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Has there been real progress in multi-hop question-answering? Models often
exploit dataset artifacts to produce correct answers, without connecting
information across multiple supporting facts. This limits our ability to
measure true progress and defeats the purpose of building multi-hop QA
datasets. We make three contributions towards addressing this. First, we
formalize such undesirable behavior as disconnected reasoning across subsets of
supporting facts. This allows developing a model-agnostic probe for measuring
how much any model can cheat via disconnected reasoning. Second, using a notion
of \emph{contrastive support sufficiency}, we introduce an automatic
transformation of existing datasets that reduces the amount of disconnected
reasoning. Third, our experiments suggest that there hasn't been much progress
in multi-hop QA in the reading comprehension setting. For a recent large-scale
model (XLNet), we show that only 18 points out of its answer F1 score of 72 on
HotpotQA are obtained through multifact reasoning, roughly the same as that of
a simpler RNN baseline. Our transformation substantially reduces disconnected
reasoning (19 points in answer F1). It is complementary to adversarial
approaches, yielding further reductions in conjunction.
Related papers
- MoreHopQA: More Than Multi-hop Reasoning [32.94332511203639]
We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers.
Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2WikiMultihopQA, and MuSiQue.
Our results show that models perform well on initial multi-hop questions but struggle with our extended questions.
arXiv Detail & Related papers (2024-06-19T09:38:59Z) - Counterfactual Multihop QA: A Cause-Effect Approach for Reducing
Disconnected Reasoning [5.343815893782489]
Multi-hop QA requires reasoning over multiple supporting facts to answer the question.
We propose a novel counterfactual multihop QA, a causal-effect approach that enables to reduce the disconnected reasoning.
Our method achieves 5.8% higher points of its Supp$_s$ score on HotpotQA through true multihop reasoning.
arXiv Detail & Related papers (2022-10-13T16:21:53Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Reasoning Chain Based Adversarial Attack for Multi-hop Question
Answering [0.0]
Previous adversarial attack works usually edit the whole question sentence.
We propose a multi-hop reasoning chain based adversarial attack method.
Results demonstrate significant performance reduction on both answer and supporting facts prediction.
arXiv Detail & Related papers (2021-12-17T18:03:14Z) - Mitigating False-Negative Contexts in Multi-document QuestionAnswering
with Retrieval Marginalization [29.797379277423143]
We develop a new parameterization of set-valued retrieval that properly handles unanswerable queries.
We show that marginalizing over this set during training allows a model to mitigate false negatives in annotated supporting evidences.
On IIRC, we show that joint modeling with marginalization on alternative contexts improves model performance by 5.5 F1 points and achieves a new state-of-the-art performance of 50.6 F1.
arXiv Detail & Related papers (2021-03-22T23:44:35Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.