Learning to Explain: Datasets and Models for Identifying Valid Reasoning
Chains in Multihop Question-Answering
- URL: http://arxiv.org/abs/2010.03274v1
- Date: Wed, 7 Oct 2020 08:46:02 GMT
- Title: Learning to Explain: Datasets and Models for Identifying Valid Reasoning
Chains in Multihop Question-Answering
- Authors: Harsh Jhamtani and Peter Clark
- Abstract summary: We introduce three datasets in which explanations formed from corpus facts are annotated.
eQASC contains over 98K explanation annotations for the multihop question answering dataset QASC.
eQASC-perturbed is constructed by crowd-sourcing perturbations to test consistency and generalization of explanation prediction models.
eOBQA is constructed by adding explanation annotations to the OBQA dataset to test generalization of models trained on eQASC.
- Score: 28.67167530758428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the rapid progress in multihop question-answering (QA), models still
have trouble explaining why an answer is correct, with limited explanation
training data available to learn from. To address this, we introduce three
explanation datasets in which explanations formed from corpus facts are
annotated. Our first dataset, eQASC, contains over 98K explanation annotations
for the multihop question answering dataset QASC, and is the first that
annotates multiple candidate explanations for each answer. The second dataset
eQASC-perturbed is constructed by crowd-sourcing perturbations (while
preserving their validity) of a subset of explanations in QASC, to test
consistency and generalization of explanation prediction models. The third
dataset eOBQA is constructed by adding explanation annotations to the OBQA
dataset to test generalization of models trained on eQASC. We show that this
data can be used to significantly improve explanation quality (+14% absolute F1
over a strong retrieval baseline) using a BERT-based classifier, but still
behind the upper bound, offering a new challenge for future research. We also
explore a delexicalized chain representation in which repeated noun phrases are
replaced by variables, thus turning them into generalized reasoning chains (for
example: "X is a Y" AND "Y has Z" IMPLIES "X has Z"). We find that generalized
chains maintain performance while also being more robust to certain
perturbations.
Related papers
- GSQA: An End-to-End Model for Generative Spoken Question Answering [54.418723701886115]
We introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning.
Our model surpasses the previous extractive model by 3% on extractive QA datasets.
Our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA.
arXiv Detail & Related papers (2023-12-15T13:33:18Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - FeTaQA: Free-form Table Question Answering [33.018256483762386]
We introduce FeTaQA, a new dataset with 10K Wikipedia-based table, question, free-form answer, supporting table cells pairs.
FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source.
arXiv Detail & Related papers (2021-04-01T09:59:40Z) - QED: A Framework and Dataset for Explanations in Question Answering [27.85923397716627]
We release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset.
A promising result suggests that training on a relatively small amount of QED data can improve question answering.
arXiv Detail & Related papers (2020-09-08T23:34:18Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.