Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question
Answering
- URL: http://arxiv.org/abs/2106.04016v1
- Date: Tue, 8 Jun 2021 00:03:40 GMT
- Title: Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question
Answering
- Authors: Aditya Gupta, Jiacheng Xu, Shyam Upadhyay, Diyi Yang, Manaal Faruqui
- Abstract summary: Disfl-QA is a new challenge question answering dataset.
Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text.
We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning.
- Score: 21.857273918785452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disfluencies is an under-studied topic in NLP, even though it is ubiquitous
in human conversation. This is largely due to the lack of datasets containing
disfluencies. In this paper, we present a new challenge question answering
dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual
disfluencies in previously fluent questions. Disfl-QA contains a variety of
challenging disfluencies that require a more comprehensive understanding of the
text than what was necessary in prior datasets. Experiments show that the
performance of existing state-of-the-art question answering models degrades
significantly when tested on Disfl-QA in a zero-shot setting.We show data
augmentation methods partially recover the loss in performance and also
demonstrate the efficacy of using gold data for fine-tuning. We argue that we
need large-scale disfluency datasets in order for NLP models to be robust to
them. The dataset is publicly available at:
https://github.com/google-research-datasets/disfl-qa.
Related papers
- Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering.
We transform the raw text into a graph structure to build connections between different factual sentences.
We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text.
We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z) - Can Question Generation Debias Question Answering Models? A Case Study
on Question-Context Lexical Overlap [25.80004272277982]
Recent neural QG models are biased towards generating questions with high lexical overlap.
We propose a synonym replacement-based approach to augment questions with low lexical overlap.
arXiv Detail & Related papers (2021-09-23T09:53:54Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Overcoming Language Priors with Self-supervised Learning for Visual
Question Answering [62.88124382512111]
Most Visual Question Answering (VQA) models suffer from the language prior problem.
We introduce a self-supervised learning framework to solve this problem.
Our method can significantly outperform the state-of-the-art.
arXiv Detail & Related papers (2020-12-17T12:30:12Z) - When in Doubt, Ask: Generating Answerable and Unanswerable Questions,
Unsupervised [0.0]
Question Answering (QA) is key for making possible a robust communication between human and machine.
Modern language models used for QA have surpassed the human-performance in several essential tasks.
This paper studies augmenting human-made datasets with synthetic data as a way of surmounting this problem.
arXiv Detail & Related papers (2020-10-04T15:56:44Z) - What do Models Learn from Question Answering Datasets? [2.28438857884398]
We investigate if models are learning reading comprehension from question answering datasets.
We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect data, and ability to handle question variations.
We make recommendations for building future QA datasets that better evaluate the task of question answering through reading comprehension.
arXiv Detail & Related papers (2020-04-07T15:41:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.