Lexically-Accelerated Dense Retrieval
- URL: http://arxiv.org/abs/2307.16779v1
- Date: Mon, 31 Jul 2023 15:44:26 GMT
- Title: Lexically-Accelerated Dense Retrieval
- Authors: Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder
- Abstract summary: 'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models.
LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
- Score: 29.327878974130055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval approaches that score documents based on learned dense vectors
(i.e., dense retrieval) rather than lexical signals (i.e., conventional
retrieval) are increasingly popular. Their ability to identify related
documents that do not necessarily contain the same terms as those appearing in
the user's query (thereby improving recall) is one of their key advantages.
However, to actually achieve these gains, dense retrieval approaches typically
require an exhaustive search over the document collection, making them
considerably more expensive at query-time than conventional lexical approaches.
Several techniques aim to reduce this computational overhead by approximating
the results of a full dense retriever. Although these approaches reasonably
approximate the top results, they suffer in terms of recall -- one of the key
advantages of dense retrieval. We introduce 'LADR' (Lexically-Accelerated Dense
Retrieval), a simple-yet-effective approach that improves the efficiency of
existing dense retrieval models without compromising on retrieval
effectiveness. LADR uses lexical retrieval techniques to seed a dense retrieval
exploration that uses a document proximity graph. We explore two variants of
LADR: a proactive approach that expands the search space to the neighbors of
all seed documents, and an adaptive approach that selectively searches the
documents with the highest estimated relevance in an iterative fashion. Through
extensive experiments across a variety of dense retrieval models, we find that
LADR establishes a new dense retrieval effectiveness-efficiency Pareto frontier
among approximate k nearest neighbor techniques. Further, we find that when
tuned to take around 8ms per query in retrieval latency on our hardware, LADR
consistently achieves both precision and recall that are on par with an
exhaustive search on standard benchmarks.
Related papers
- PairDistill: Pairwise Relevance Distillation for Dense Retrieval [35.067998820937284]
This paper introduces Pairwise Relevance Distillation (PairDistill) to leverage pairwise reranking.
It offers fine-grained distinctions between similarly relevant documents to enrich the training of dense retrieval models.
Our experiments demonstrate that PairDistill outperforms existing methods, achieving new state-of-the-art results across multiple benchmarks.
arXiv Detail & Related papers (2024-10-02T09:51:42Z) - LexBoost: Improving Lexical Document Retrieval with Nearest Neighbors [37.64658206917278]
LexBoost builds a network of dense neighbors (a corpus graph) using a dense retrieval approach while indexing.
We consider both a document's lexical relevance scores and its neighbors' scores to rank the documents.
We show that re-ranking on top of LexBoost outperforms traditional dense re-ranking and leads to results comparable with higher-latency exhaustive dense retrieval.
arXiv Detail & Related papers (2024-08-25T18:11:37Z) - Early Exit Strategies for Approximate k-NN Search in Dense Retrieval [10.48678957367324]
We build upon state-of-the-art for early exit A-kNN and propose an unsupervised method based on the notion of patience.
We show that our techniques improve the A-kNN efficiency with up to 5x speedups while achieving negligible effectiveness losses.
arXiv Detail & Related papers (2024-08-09T10:17:07Z) - Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities.
We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.