When to Read Documents or QA History: On Unified and Selective
Open-domain QA
- URL: http://arxiv.org/abs/2306.04176v1
- Date: Wed, 7 Jun 2023 06:03:39 GMT
- Title: When to Read Documents or QA History: On Unified and Selective
Open-domain QA
- Authors: Kyungjae Lee, Sang-eun Han, Seung-won Hwang, Moontae Lee
- Abstract summary: This paper studies the problem of open-domain question answering, with the aim of answering a diverse range of questions leveraging knowledge resources.
Two types of sources, QA-pair and document corpora, have been actively leveraged with the following complementary strength.
A natural follow-up is thus leveraging both models, while a naive pipelining or integration approaches have failed to bring additional gains over either model alone.
- Score: 22.941325275188376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the problem of open-domain question answering, with the
aim of answering a diverse range of questions leveraging knowledge resources.
Two types of sources, QA-pair and document corpora, have been actively
leveraged with the following complementary strength. The former is highly
precise when the paraphrase of given question $q$ was seen and answered during
training, often posed as a retrieval problem, while the latter generalizes
better for unseen questions. A natural follow-up is thus leveraging both
models, while a naive pipelining or integration approaches have failed to bring
additional gains over either model alone. Our distinction is interpreting the
problem as calibration, which estimates the confidence of predicted answers as
an indicator to decide when to use a document or QA-pair corpus. The
effectiveness of our method was validated on widely adopted benchmarks such as
Natural Questions and TriviaQA.
Related papers
- Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization [67.92796510359595]
Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus.
It is still unclear how well an OpenQA model can transfer to completely new knowledge domains.
We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization.
arXiv Detail & Related papers (2024-04-02T05:44:50Z) - CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums.
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z) - DisentQA: Disentangling Parametric and Contextual Knowledge with
Counterfactual Question Answering [34.70206857546496]
Question answering models commonly have access to two sources of "knowledge" during inference time.
It is unclear whether the answer stems from the given non-parametric knowledge or not.
We propose a new paradigm in which QA models are trained to disentangle the two sources of knowledge.
arXiv Detail & Related papers (2022-11-10T15:34:44Z) - Context Generation Improves Open Domain Question Answering [102.34183939011352]
We propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract relevant knowledge and answer a question.
Our method is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning.
arXiv Detail & Related papers (2022-10-12T16:00:50Z) - ASQA: Factoid Questions Meet Long-Form Answers [35.11889930792675]
This work focuses on factoid questions that are ambiguous, that is, have different correct answers depending on interpretation.
Answers to ambiguous questions should synthesize factual information from multiple sources into a long-form summary.
We use this notion of correctness to define an automated metric of performance for ASQA.
arXiv Detail & Related papers (2022-04-12T21:58:44Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - SituatedQA: Incorporating Extra-Linguistic Contexts into QA [7.495151447459443]
We introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context.
We find that a significant proportion of information seeking questions have context-dependent answers.
Our study shows that existing models struggle with producing answers that are frequently updated or from uncommon locations.
arXiv Detail & Related papers (2021-09-13T17:53:21Z) - A Wrong Answer or a Wrong Question? An Intricate Relationship between
Question Reformulation and Answer Selection in Conversational Question
Answering [15.355557454305776]
We show that question rewriting (QR) of the conversational context allows to shed more light on this phenomenon.
We present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets.
arXiv Detail & Related papers (2020-10-13T06:29:51Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.