What do Models Learn from Question Answering Datasets?
- URL: http://arxiv.org/abs/2004.03490v2
- Date: Tue, 13 Oct 2020 13:02:44 GMT
- Title: What do Models Learn from Question Answering Datasets?
- Authors: Priyanka Sen and Amir Saffari
- Abstract summary: We investigate if models are learning reading comprehension from question answering datasets.
We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect data, and ability to handle question variations.
We make recommendations for building future QA datasets that better evaluate the task of question answering through reading comprehension.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While models have reached superhuman performance on popular question
answering (QA) datasets such as SQuAD, they have yet to outperform humans on
the task of question answering itself. In this paper, we investigate if models
are learning reading comprehension from QA datasets by evaluating BERT-based
models across five datasets. We evaluate models on their generalizability to
out-of-domain examples, responses to missing or incorrect data, and ability to
handle question variations. We find that no single dataset is robust to all of
our experiments and identify shortcomings in both datasets and evaluation
methods. Following our analysis, we make recommendations for building future QA
datasets that better evaluate the task of question answering through reading
comprehension. We also release code to convert QA datasets to a shared format
for easier experimentation at
https://github.com/amazon-research/qa-dataset-converter.
Related papers
- GSQA: An End-to-End Model for Generative Spoken Question Answering [54.418723701886115]
We introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning.
Our model surpasses the previous extractive model by 3% on extractive QA datasets.
Our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA.
arXiv Detail & Related papers (2023-12-15T13:33:18Z) - A Lightweight Method to Generate Unanswerable Questions in English [18.323248259867356]
We examine a simpler data augmentation method for unanswerable question generation in English.
We perform antonym and entity swaps on answerable questions.
Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
arXiv Detail & Related papers (2023-10-30T10:14:52Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question
Answering [21.857273918785452]
Disfl-QA is a new challenge question answering dataset.
Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text.
We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning.
arXiv Detail & Related papers (2021-06-08T00:03:40Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - Unsupervised Evaluation for Question Answering with Transformers [46.16837670041594]
We investigate the hidden representations of questions, answers, and contexts in transformer-based QA architectures.
We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer is correct.
We are able to predict whether or not a model's answer is correct with 91.37% accuracy SQuAD, and 80.7% accuracy on SubjQA.
arXiv Detail & Related papers (2020-10-07T07:03:30Z) - When in Doubt, Ask: Generating Answerable and Unanswerable Questions,
Unsupervised [0.0]
Question Answering (QA) is key for making possible a robust communication between human and machine.
Modern language models used for QA have surpassed the human-performance in several essential tasks.
This paper studies augmenting human-made datasets with synthetic data as a way of surmounting this problem.
arXiv Detail & Related papers (2020-10-04T15:56:44Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.