Phrase Retrieval Learns Passage Retrieval, Too
- URL: http://arxiv.org/abs/2109.08133v1
- Date: Thu, 16 Sep 2021 17:42:45 GMT
- Title: Phrase Retrieval Learns Passage Retrieval, Too
- Authors: Jinhyuk Lee, Alexander Wettig, Danqi Chen
- Abstract summary: We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents.
We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy.
We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
- Score: 77.57208968326422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense retrieval methods have shown great promise over sparse retrieval
methods in a range of NLP problems. Among them, dense phrase retrieval-the most
fine-grained retrieval unit-is appealing because phrases can be directly used
as the output for question answering and slot filling tasks. In this work, we
follow the intuition that retrieving phrases naturally entails retrieving
larger text blocks and study whether phrase retrieval can serve as the basis
for coarse-level retrieval including passages and documents. We first observe
that a dense phrase-retrieval system, without any retraining, already achieves
better passage retrieval accuracy (+3-5% in top-5 accuracy) compared to passage
retrievers, which also helps achieve superior end-to-end QA performance with
fewer passages. Then, we provide an interpretation for why phrase-level
supervision helps learn better fine-grained entailment compared to
passage-level supervision, and also show that phrase retrieval can be improved
to achieve competitive performance in document-retrieval tasks such as entity
linking and knowledge-grounded dialogue. Finally, we demonstrate how phrase
filtering and vector quantization can reduce the size of our index by 4-10x,
making dense phrase retrieval a practical and versatile solution in
multi-granularity retrieval.
Related papers
- Improve Dense Passage Retrieval with Entailment Tuning [22.39221206192245]
Key to a retrieval system is to calculate relevance scores to query and passage pairs.
We observed that a major class of relevance aligns with the concept of entailment in NLI tasks.
We design a method called entailment tuning to improve the embedding of dense retrievers.
arXiv Detail & Related papers (2024-10-21T09:18:30Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Lexically-Accelerated Dense Retrieval [29.327878974130055]
'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models.
LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
arXiv Detail & Related papers (2023-07-31T15:44:26Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Bridging the Training-Inference Gap for Dense Phrase Retrieval [104.4836127502683]
Building dense retrievers requires a series of standard procedures, including training and validating neural models.
In this paper, we explore how the gap between training and inference in dense retrieval can be reduced.
We propose an efficient way of validating dense retrievers using a small subset of the entire corpus.
arXiv Detail & Related papers (2022-10-25T00:53:06Z) - LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval [68.85686621130111]
We propose to make a dense retriever align a well-performing lexicon-aware representation model.
We evaluate our model on three public benchmarks, which shows that with a comparable lexicon-aware retriever as the teacher, our proposed dense model can bring consistent and significant improvements.
arXiv Detail & Related papers (2022-08-29T15:09:28Z) - Learning Dense Representations of Phrases at Scale [22.792942611601347]
We show for the first time that we can learn dense phrase representations alone that achieve much stronger performance in open-domain QA.
Our model DensePhrases improves previous phrase retrieval models by 15%-25% absolute accuracy.
Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs.
arXiv Detail & Related papers (2020-12-23T12:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.