Tell Me How to Ask Again: Question Data Augmentation with Controllable
Rewriting in Continuous Space
- URL: http://arxiv.org/abs/2010.01475v1
- Date: Sun, 4 Oct 2020 03:13:46 GMT
- Title: Tell Me How to Ask Again: Question Data Augmentation with Controllable
Rewriting in Continuous Space
- Authors: Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Jiancheng Lv,
Nan Duan and Ming Zhou
- Abstract summary: Controllable Rewriting based Question Data Augmentation (CRQDA) for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks.
We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples.
- Score: 94.8320535537798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel data augmentation method, referred to as
Controllable Rewriting based Question Data Augmentation (CRQDA), for machine
reading comprehension (MRC), question generation, and question-answering
natural language inference tasks. We treat the question data augmentation task
as a constrained question rewriting problem to generate context-relevant,
high-quality, and diverse question data samples. CRQDA utilizes a Transformer
autoencoder to map the original discrete question into a continuous embedding
space. It then uses a pre-trained MRC model to revise the question
representation iteratively with gradient-based optimization. Finally, the
revised question representations are mapped back into the discrete space, which
serve as additional question data. Comprehensive experiments on SQuAD 2.0,
SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of
CRQDA
Related papers
- Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - QASnowball: An Iterative Bootstrapping Framework for High-Quality
Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball)
QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples.
We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z) - Improving Question Answering with Generation of NQ-like Questions [12.276281998447079]
Question Answering (QA) systems require a large amount of annotated data which is costly and time-consuming to gather.
We propose an algorithm to automatically generate shorter questions resembling day-to-day human communication in the Natural Questions (NQ) dataset from longer trivia questions in Quizbowl (QB) dataset.
arXiv Detail & Related papers (2022-10-12T21:36:20Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Generating Self-Contained and Summary-Centric Question Answer Pairs via
Differentiable Reward Imitation Learning [7.2745835227138045]
We propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers.
This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions.
arXiv Detail & Related papers (2021-09-10T06:34:55Z) - Self-Teaching Machines to Read and Comprehend with Large-Scale
Multi-Subject Question Answering Data [58.36305373100518]
It is unclear whether subject-area question-answering data is useful for machine reading comprehension tasks.
We collect a large-scale multi-subject multiple-choice question-answering dataset, ExamQA.
We use incomplete and noisy snippets returned by a web search engine as the relevant context for each question-answering instance to convert it into a weakly-labeled MRC instance.
arXiv Detail & Related papers (2021-02-01T23:18:58Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Simplifying Paragraph-level Question Generation via Transformer Language
Models [0.0]
Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text.
A single Transformer-based unidirectional language model leveraging transfer learning can be used to produce high quality questions.
Our QG model, finetuned from GPT-2 Small, outperforms several paragraph-level QG baselines on the SQuAD dataset by 0.95 METEOR points.
arXiv Detail & Related papers (2020-05-03T14:57:24Z) - Break It Down: A Question Understanding Benchmark [79.41678884521801]
We introduce a Question Decomposition Representation Meaning (QDMR) for questions.
QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question.
We release the Break dataset, containing over 83K pairs of questions and their QDMRs.
arXiv Detail & Related papers (2020-01-31T11:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.