End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training
- URL: http://arxiv.org/abs/2012.01414v1
- Date: Wed, 2 Dec 2020 18:59:59 GMT
- Title: End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training
- Authors: Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avi
Sil, Vittorio Castelli, Radu Florian, Salim Roukos
- Abstract summary: End-to-end question answering requires both information retrieval and machine reading comprehension.
Recent work has successfully trained neural IR systems using only supervised question answering (QA) examples from open-domain datasets.
We combine our neural IR and MRC systems and show significant improvements in end-to-end QA on the CORD-19 collection.
- Score: 13.731352294133211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end question answering (QA) requires both information retrieval (IR)
over a large document collection and machine reading comprehension (MRC) on the
retrieved passages. Recent work has successfully trained neural IR systems
using only supervised question answering (QA) examples from open-domain
datasets. However, despite impressive performance on Wikipedia, neural IR lags
behind traditional term matching approaches such as BM25 in more specific and
specialized target domains such as COVID-19. Furthermore, given little or no
labeled data, effective adaptation of QA systems can also be challenging in
such target domains. In this work, we explore the application of synthetically
generated QA examples to improve performance on closed-domain retrieval and
MRC. We combine our neural IR and MRC systems and show significant improvements
in end-to-end QA on the CORD-19 collection over a state-of-the-art open-domain
QA baseline.
Related papers
- Enhancing Retrieval in QA Systems with Derived Feature Association [0.0]
Retrieval augmented generation (RAG) has become the standard in long context question answering (QA) systems.
We propose a novel extension to RAG systems, which we call Retrieval from AI Derived Documents (RAIDD)
arXiv Detail & Related papers (2024-10-02T05:24:49Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - Better Retrieval May Not Lead to Better Question Answering [59.1892787017522]
A popular approach to improve the system's performance is to improve the quality of the retrieved context from the IR stage.
We show that for StrategyQA, a challenging open-domain QA dataset that requires multi-hop reasoning, this common approach is surprisingly ineffective.
arXiv Detail & Related papers (2022-05-07T16:59:38Z) - Synthetic Target Domain Supervision for Open Retrieval QA [24.48364368847857]
We stress-test the Dense Passage Retriever (DPR) on closed and specialized target domains such as COVID-19.
DPR lags behind standard BM25 in this important real-world setting.
In experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25.
arXiv Detail & Related papers (2022-04-20T06:28:13Z) - Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z) - Cross-Domain Generalization Through Memorization: A Study of Nearest
Neighbors in Neural Duplicate Question Detection [72.01292864036087]
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems.
We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD.
We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
arXiv Detail & Related papers (2020-11-22T19:19:33Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z) - Dense Passage Retrieval for Open-Domain Question Answering [49.028342823838486]
We show that retrieval can be practically implemented using dense representations alone.
Our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy.
arXiv Detail & Related papers (2020-04-10T04:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.