A Lightweight Method to Generate Unanswerable Questions in English
- URL: http://arxiv.org/abs/2310.19403v1
- Date: Mon, 30 Oct 2023 10:14:52 GMT
- Title: A Lightweight Method to Generate Unanswerable Questions in English
- Authors: Vagrant Gautam, Miaoran Zhang, Dietrich Klakow
- Abstract summary: We examine a simpler data augmentation method for unanswerable question generation in English.
We perform antonym and entity swaps on answerable questions.
Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
- Score: 18.323248259867356
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: If a question cannot be answered with the available information, robust
systems for question answering (QA) should know _not_ to answer. One way to
build QA models that do this is with additional training data comprised of
unanswerable questions, created either by employing annotators or through
automated methods for unanswerable question generation. To show that the model
complexity of existing automated approaches is not justified, we examine a
simpler data augmentation method for unanswerable question generation in
English: performing antonym and entity swaps on answerable questions. Compared
to the prior state-of-the-art, data generated with our training-free and
lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data
with BERT-large), and has higher human-judged relatedness and readability. We
quantify the raw benefits of our approach compared to no augmentation across
multiple encoder models, using different amounts of generated data, and also on
TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish
swaps as a simple but strong baseline for future work.
Related papers
- QASnowball: An Iterative Bootstrapping Framework for High-Quality
Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball)
QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples.
We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - QUADRo: Dataset and Models for QUestion-Answer Database Retrieval [97.84448420852854]
Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions.
We build a large scale DB of 6.3M q/a pairs, using public questions, and design a new system based on neural IR and a q/a pair reranker.
We show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine.
arXiv Detail & Related papers (2023-03-30T00:42:07Z) - Improving Question Answering with Generation of NQ-like Questions [12.276281998447079]
Question Answering (QA) systems require a large amount of annotated data which is costly and time-consuming to gather.
We propose an algorithm to automatically generate shorter questions resembling day-to-day human communication in the Natural Questions (NQ) dataset from longer trivia questions in Quizbowl (QB) dataset.
arXiv Detail & Related papers (2022-10-12T21:36:20Z) - When in Doubt, Ask: Generating Answerable and Unanswerable Questions,
Unsupervised [0.0]
Question Answering (QA) is key for making possible a robust communication between human and machine.
Modern language models used for QA have surpassed the human-performance in several essential tasks.
This paper studies augmenting human-made datasets with synthetic data as a way of surmounting this problem.
arXiv Detail & Related papers (2020-10-04T15:56:44Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.