Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs
- URL: http://arxiv.org/abs/2005.13837v5
- Date: Mon, 15 Jun 2020 02:55:11 GMT
- Title: Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs
- Authors: Dong Bok Lee, Seanie Lee, Woo Tae Jeong, Donghwan Kim, Sung Ju Hwang
- Abstract summary: We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
- Score: 62.71505254770827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most crucial challenges in question answering (QA) is the scarcity
of labeled data, since it is costly to obtain question-answer (QA) pairs for a
target text domain with human annotation. An alternative approach to tackle the
problem is to use automatically generated QA pairs from either the problem
context or from large amount of unstructured texts (e.g. Wikipedia). In this
work, we propose a hierarchical conditional variational autoencoder (HCVAE) for
generating QA pairs given unstructured texts as contexts, while maximizing the
mutual information between generated QA pairs to ensure their consistency. We
validate our Information Maximizing Hierarchical Conditional Variational
AutoEncoder (Info-HCVAE) on several benchmark datasets by evaluating the
performance of the QA model (BERT-base) using only the generated QA pairs
(QA-based evaluation) or by using both the generated and human-labeled pairs
(semi-supervised learning) for training, against state-of-the-art baseline
models. The results show that our model obtains impressive performance gains
over all baselines on both tasks, using only a fraction of data for training.
Related papers
- Graph Guided Question Answer Generation for Procedural
Question-Answering [29.169773816553153]
We introduce a method for generating exhaustive and high-quality training data for task-specific question answering (QA) models.
Key technological enabler is a novel mechanism for automatic question-answer generation from procedural text.
We show that small models trained with our data achieve excellent performance on the target QA task, even exceeding that of GPT3 and ChatGPT.
arXiv Detail & Related papers (2024-01-24T17:01:42Z) - QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
Zero-Shot Commonsense Question Answering [48.25449258017601]
State-of-the-art approaches fine-tune language models on QA pairs constructed from CommonSense Knowledge Bases.
We propose QADYNAMICS, a training dynamics-driven framework for QA diagnostics and refinement.
arXiv Detail & Related papers (2023-10-17T14:27:34Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - How to Build Robust FAQ Chatbot with Controllable Question Generator? [5.680871239968297]
We propose a high-quality, diverse, controllable method to generate adversarial samples with a semantic graph.
The fluent and semantically generated QA pairs fool our passage retrieval model successfully.
We find that the generated data set improves the generalizability of the QA model to the new target domain.
arXiv Detail & Related papers (2021-11-18T12:54:07Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.