Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering
- URL: http://arxiv.org/abs/2004.11892v1
- Date: Fri, 24 Apr 2020 17:57:45 GMT
- Title: Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering
- Authors: Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, Bing
Xiang
- Abstract summary: We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
- Score: 98.48363619128108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question Answering (QA) is in increasing demand as the amount of information
available online and the desire for quick access to this content grows. A
common approach to QA has been to fine-tune a pretrained language model on a
task-specific labeled dataset. This paradigm, however, relies on scarce, and
costly to obtain, large-scale human-labeled data. We propose an unsupervised
approach to training QA models with generated pseudo-training data. We show
that generating questions for QA training by applying a simple template on a
related, retrieved sentence rather than the original context sentence improves
downstream QA performance by allowing the model to learn more complex
context-question relationships. Training a QA model on this data gives a
relative improvement over a previous unsupervised model in F1 score on the
SQuAD dataset by about 14%, and 20% when the answer is a named entity,
achieving state-of-the-art performance on SQuAD for unsupervised QA.
Related papers
- QASnowball: An Iterative Bootstrapping Framework for High-Quality
Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball)
QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples.
We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Long-Tailed Question Answering in an Open World [46.67715607552547]
We define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data.
We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks.
On a large-scale OLTQA dataset, our model consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-05-11T04:28:58Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.