Dense Passage Retrieval for Open-Domain Question Answering
- URL: http://arxiv.org/abs/2004.04906v3
- Date: Wed, 30 Sep 2020 21:27:13 GMT
- Title: Dense Passage Retrieval for Open-Domain Question Answering
- Authors: Vladimir Karpukhin, Barlas O\u{g}uz, Sewon Min, Patrick Lewis, Ledell
Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih
- Abstract summary: We show that retrieval can be practically implemented using dense representations alone.
Our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy.
- Score: 49.028342823838486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain question answering relies on efficient passage retrieval to
select candidate contexts, where traditional sparse vector space models, such
as TF-IDF or BM25, are the de facto method. In this work, we show that
retrieval can be practically implemented using dense representations alone,
where embeddings are learned from a small number of questions and passages by a
simple dual-encoder framework. When evaluated on a wide range of open-domain QA
datasets, our dense retriever outperforms a strong Lucene-BM25 system largely
by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our
end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.
Related papers
- Open-domain Question Answering via Chain of Reasoning over Heterogeneous
Knowledge [82.5582220249183]
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.
Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a chain of reasoning over the retrieved set.
Our system achieves competitive performance on two ODQA datasets, OTT-QA and NQ, against tables and passages from Wikipedia.
arXiv Detail & Related papers (2022-10-22T03:21:32Z) - End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training [13.731352294133211]
End-to-end question answering requires both information retrieval and machine reading comprehension.
Recent work has successfully trained neural IR systems using only supervised question answering (QA) examples from open-domain datasets.
We combine our neural IR and MRC systems and show significant improvements in end-to-end QA on the CORD-19 collection.
arXiv Detail & Related papers (2020-12-02T18:59:59Z) - Cross-Domain Generalization Through Memorization: A Study of Nearest
Neighbors in Neural Duplicate Question Detection [72.01292864036087]
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems.
We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD.
We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
arXiv Detail & Related papers (2020-11-22T19:19:33Z) - SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer
Matching Retrieval [24.77260903221371]
We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering.
SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index.
We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks.
arXiv Detail & Related papers (2020-09-28T02:11:02Z) - Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions.
Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers.
Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.