A Replication Study of Dense Passage Retriever
- URL: http://arxiv.org/abs/2104.05740v1
- Date: Mon, 12 Apr 2021 18:10:39 GMT
- Title: A Replication Study of Dense Passage Retriever
- Authors: Xueguang Ma, Kai Sun, Ronak Pradeep, and Jimmy Lin
- Abstract summary: We study the dense passage retriever (DPR) technique proposed by Karpukhin et al. ( 2020) for end-to-end open-domain question answering.
We present a replication study of this work, starting with model checkpoints provided by the authors.
We are able to improve end-to-end question answering effectiveness using exactly the same models as in the original work.
- Score: 32.192420072129636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text retrieval using learned dense representations has recently emerged as a
promising alternative to "traditional" text retrieval using sparse bag-of-words
representations. One recent work that has garnered much attention is the dense
passage retriever (DPR) technique proposed by Karpukhin et al. (2020) for
end-to-end open-domain question answering. We present a replication study of
this work, starting with model checkpoints provided by the authors, but
otherwise from an independent implementation in our group's Pyserini IR toolkit
and PyGaggle neural text ranking library. Although our experimental results
largely verify the claims of the original paper, we arrived at two important
additional findings that contribute to a better understanding of DPR: First, it
appears that the original authors under-report the effectiveness of the BM25
baseline and hence also dense--sparse hybrid retrieval results. Second, by
incorporating evidence from the retriever and an improved answer span scoring
technique, we are able to improve end-to-end question answering effectiveness
using exactly the same models as in the original work.
Related papers
- QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval [12.225881591629815]
In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching.
Recent studies mainly focus on improving the sentence embedding model or retrieval process.
We introduce a novel text augmentation framework for dense retrieval, which transforms raw documents into information-dense text formats.
arXiv Detail & Related papers (2024-07-29T17:39:08Z) - Topic-DPR: Topic-based Prompts for Dense Passage Retrieval [6.265789210037749]
We present Topic-DPR, a dense passage retrieval model that uses topic-based prompts.
We introduce a novel positive and negative sampling strategy, leveraging semi-structured data to boost dense retrieval efficiency.
arXiv Detail & Related papers (2023-10-10T13:45:24Z) - Retrieval Augmentation for Commonsense Reasoning: A Unified Approach [64.63071051375289]
We propose a unified framework of retrieval-augmented commonsense reasoning (called RACo)
Our proposed RACo can significantly outperform other knowledge-enhanced method counterparts.
arXiv Detail & Related papers (2022-10-23T23:49:08Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Where Does the Performance Improvement Come From? - A Reproducibility
Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval.
We first examine the related concerns and why the focus is on image-text retrieval tasks.
We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z) - Learning to Retrieve Passages without Supervision [58.31911597824848]
Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs.
We investigate whether dense retrievers can be learned in a self-supervised fashion, and applied effectively without any annotations.
arXiv Detail & Related papers (2021-12-14T19:18:08Z) - Contextual Fine-to-Coarse Distillation for Coarse-grained Response
Selection in Open-Domain Conversations [48.046725390986595]
We propose a Contextual Fine-to-Coarse (CFC) distilled model for coarse-grained response selection in open-domain conversations.
To evaluate the performance of our proposed model, we construct two new datasets based on the Reddit comments dump and Twitter corpus.
arXiv Detail & Related papers (2021-09-24T08:22:35Z) - Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval [51.004601358498135]
Mr. TyDi is a benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages.
The goal of this resource is to spur research in dense retrieval techniques in non-English languages.
arXiv Detail & Related papers (2021-08-19T16:53:43Z) - On Single and Multiple Representations in Dense Passage Retrieval [30.303705563808386]
Two dense retrieval families have become apparent: single representation and multiple representation.
This paper contributes a direct study on their comparative effectiveness, noting situations where each method under/over performs w.r.t. each other, and w.r.t. a BM25 baseline.
We also show that multiple representations obtain better improvements than single representations for queries that are the hardest for BM25, as well as for definitional queries.
arXiv Detail & Related papers (2021-08-13T15:01:53Z) - Joint Passage Ranking for Diverse Multi-Answer Retrieval [56.43443577137929]
We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a question.
This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve passages containing the same answer at the cost of missing a different valid answer.
In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms.
arXiv Detail & Related papers (2021-04-17T04:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.