Relation-Guided Pre-Training for Open-Domain Question Answering
- URL: http://arxiv.org/abs/2109.10346v1
- Date: Tue, 21 Sep 2021 17:59:31 GMT
- Title: Relation-Guided Pre-Training for Open-Domain Question Answering
- Authors: Ziniu Hu, Yizhou Sun, Kai-Wei Chang
- Abstract summary: We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
- Score: 67.86958978322188
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Answering complex open-domain questions requires understanding the latent
relations between involving entities. However, we found that the existing QA
datasets are extremely imbalanced in some types of relations, which hurts the
generalization performance over questions with long-tail relations. To remedy
this problem, in this paper, we propose a Relation-Guided Pre-Training
(RGPT-QA) framework. We first generate a relational QA dataset covering a wide
range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We
then pre-train a QA model to infer the latent relations from the question, and
then conduct extractive QA to get the target answer entity. We demonstrate that
by pretraining with propoed RGPT-QA techique, the popular open-domain QA model,
Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute
improvement in Exact Match accuracy on Natural Questions, TriviaQA, and
WebQuestions. Particularly, we show that RGPT-QA improves significantly on
questions with long-tail relations
Related papers
- IfQA: A Dataset for Open-domain Question Answering under Counterfactual
Presuppositions [54.23087908182134]
We introduce the first large-scale counterfactual open-domain question-answering (QA) benchmarks, named IfQA.
The IfQA dataset contains over 3,800 questions that were annotated by crowdworkers on relevant Wikipedia passages.
The unique challenges posed by the IfQA benchmark will push open-domain QA research on both retrieval and counterfactual reasoning fronts.
arXiv Detail & Related papers (2023-05-23T12:43:19Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - GTM: A Generative Triple-Wise Model for Conversational Question
Generation [36.33685095934868]
We propose a generative triple-wise model with hierarchical variations for open-domain conversational question generation (CQG)
Our method significantly improves the quality of questions in terms of fluency, coherence and diversity over competitive baselines.
arXiv Detail & Related papers (2021-06-07T14:07:07Z) - Effective FAQ Retrieval and Question Matching With Unsupervised
Knowledge Injection [10.82418428209551]
We propose a contextual language model for retrieving appropriate answers to frequently asked questions.
We also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner.
We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task.
arXiv Detail & Related papers (2020-10-27T05:03:34Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.