OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop
Approach
- URL: http://arxiv.org/abs/2102.12128v1
- Date: Wed, 24 Feb 2021 08:45:00 GMT
- Title: OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop
Approach
- Authors: Shaobo Cui, Xintong Bao, Xinxing Zu, Yangyang Guo, Zhongzhou Zhao, Ji
Zhang, Haiqing Chen
- Abstract summary: We propose a model named OneStop to generate QA pairs from documents in a one-stop approach.
Specifically, questions and their corresponding answer span is extracted simultaneously.
OneStop is much more efficient to be trained and deployed in industrial scenarios since it involves only one model to solve the complex QA generation task.
- Score: 11.057028572260064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale question-answer (QA) pairs are critical for advancing research
areas like machine reading comprehension and question answering. To construct
QA pairs from documents requires determining how to ask a question and what is
the corresponding answer. Existing methods for QA pair generation usually
follow a pipeline approach. Namely, they first choose the most likely candidate
answer span and then generate the answer-specific question. This pipeline
approach, however, is undesired in mining the most appropriate QA pairs from
documents since it ignores the connection between question generation and
answer extraction, which may lead to incompatible QA pair generation, i.e., the
selected answer span is inappropriate for question generation. However, for
human annotators, we take the whole QA pair into account and consider the
compatibility between question and answer. Inspired by such motivation, instead
of the conventional pipeline approach, we propose a model named OneStop
generate QA pairs from documents in a one-stop approach. Specifically,
questions and their corresponding answer span is extracted simultaneously and
the process of question generation and answer extraction mutually affect each
other. Additionally, OneStop is much more efficient to be trained and deployed
in industrial scenarios since it involves only one model to solve the complex
QA generation task. We conduct comprehensive experiments on three large-scale
machine reading comprehension datasets: SQuAD, NewsQA, and DuReader. The
experimental results demonstrate that our OneStop model outperforms the
baselines significantly regarding the quality of generated questions, quality
of generated question-answer pairs, and model efficiency.
Related papers
- An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Generating Self-Contained and Summary-Centric Question Answer Pairs via
Differentiable Reward Imitation Learning [7.2745835227138045]
We propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers.
This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions.
arXiv Detail & Related papers (2021-09-10T06:34:55Z) - GTM: A Generative Triple-Wise Model for Conversational Question
Generation [36.33685095934868]
We propose a generative triple-wise model with hierarchical variations for open-domain conversational question generation (CQG)
Our method significantly improves the quality of questions in terms of fluency, coherence and diversity over competitive baselines.
arXiv Detail & Related papers (2021-06-07T14:07:07Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Fluent Response Generation for Conversational Question Answering [15.826109118064716]
We propose a method for situating responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses.
We use data augmentation to generate training data for an end-to-end system.
arXiv Detail & Related papers (2020-05-21T04:57:01Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.