Related papers: Dense Passage Retrieval for Open-Domain Question Answering

Dense Passage Retrieval for Open-Domain Question Answering

URL: http://arxiv.org/abs/2004.04906v3
Date: Wed, 30 Sep 2020 21:27:13 GMT
Title: Dense Passage Retrieval for Open-Domain Question Answering
Authors: Vladimir Karpukhin, Barlas O\u{g}uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih
Abstract summary: We show that retrieval can be practically implemented using dense representations alone. Our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy.
Score: 49.028342823838486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

Related papers

Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering [15.04887070246276]
Large language models have recently pushed open domain question answering to new frontiers. prevailing retriever-reader pipelines often depend on multiple rounds of prompt level instructions. We propose EmbQA, an embedding-level framework that enhances both the retriever and the reader.
arXiv Detail & Related papers (2025-03-03T14:41:35Z)
Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge [82.5582220249183]
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources. Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a chain of reasoning over the retrieved set. Our system achieves competitive performance on two ODQA datasets, OTT-QA and NQ, against tables and passages from Wikipedia.
arXiv Detail & Related papers (2022-10-22T03:21:32Z)
End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training [13.731352294133211]
End-to-end question answering requires both information retrieval and machine reading comprehension. Recent work has successfully trained neural IR systems using only supervised question answering (QA) examples from open-domain datasets. We combine our neural IR and MRC systems and show significant improvements in end-to-end QA on the CORD-19 collection.
arXiv Detail & Related papers (2020-12-02T18:59:59Z)
Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection [72.01292864036087]
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
arXiv Detail & Related papers (2020-11-22T19:19:33Z)
SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval [24.77260903221371]
We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering. SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index. We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks.
arXiv Detail & Related papers (2020-09-28T02:11:02Z)
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions. Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z)
Tradeoffs in Sentence Selection Techniques for Open-Domain Question Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z)
Answering Any-hop Open-domain Questions with Iterative Document Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions. Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.