Can Question Generation Debias Question Answering Models? A Case Study
on Question-Context Lexical Overlap
- URL: http://arxiv.org/abs/2109.11256v1
- Date: Thu, 23 Sep 2021 09:53:54 GMT
- Title: Can Question Generation Debias Question Answering Models? A Case Study
on Question-Context Lexical Overlap
- Authors: Kazutoshi Shinoda and Saku Sugawara and Akiko Aizawa
- Abstract summary: Recent neural QG models are biased towards generating questions with high lexical overlap.
We propose a synonym replacement-based approach to augment questions with low lexical overlap.
- Score: 25.80004272277982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question answering (QA) models for reading comprehension have been
demonstrated to exploit unintended dataset biases such as question-context
lexical overlap. This hinders QA models from generalizing to under-represented
samples such as questions with low lexical overlap. Question generation (QG), a
method for augmenting QA datasets, can be a solution for such performance
degradation if QG can properly debias QA datasets. However, we discover that
recent neural QG models are biased towards generating questions with high
lexical overlap, which can amplify the dataset bias. Moreover, our analysis
reveals that data augmentation with these QG models frequently impairs the
performance on questions with low lexical overlap, while improving that on
questions with high lexical overlap. To address this problem, we use a synonym
replacement-based approach to augment questions with low lexical overlap. We
demonstrate that the proposed data augmentation approach is simple yet
effective to mitigate the degradation problem with only 70k synthetic examples.
Our data is publicly available at
https://github.com/KazutoshiShinoda/Synonym-Replacement.
Related papers
- GSQA: An End-to-End Model for Generative Spoken Question Answering [54.418723701886115]
We introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning.
Our model surpasses the previous extractive model by 3% on extractive QA datasets.
Our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA.
arXiv Detail & Related papers (2023-12-15T13:33:18Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Event Extraction as Question Generation and Answering [72.04433206754489]
Recent work on Event Extraction has reframed the task as Question Answering (QA)
We propose QGA-EE, which enables a Question Generation (QG) model to generate questions that incorporate rich contextual information instead of using fixed templates.
Experiments show that QGA-EE outperforms all prior single-task-based models on the ACE05 English dataset.
arXiv Detail & Related papers (2023-07-10T01:46:15Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question
Answering [21.857273918785452]
Disfl-QA is a new challenge question answering dataset.
Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text.
We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning.
arXiv Detail & Related papers (2021-06-08T00:03:40Z) - A Wrong Answer or a Wrong Question? An Intricate Relationship between
Question Reformulation and Answer Selection in Conversational Question
Answering [15.355557454305776]
We show that question rewriting (QR) of the conversational context allows to shed more light on this phenomenon.
We present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets.
arXiv Detail & Related papers (2020-10-13T06:29:51Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.