Related papers: Phrase Retrieval Learns Passage Retrieval, Too

Phrase Retrieval Learns Passage Retrieval, Too

URL: http://arxiv.org/abs/2109.08133v1
Date: Thu, 16 Sep 2021 17:42:45 GMT
Title: Phrase Retrieval Learns Passage Retrieval, Too
Authors: Jinhyuk Lee, Alexander Wettig, Danqi Chen
Abstract summary: We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents. We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy. We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
Score: 77.57208968326422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dense retrieval methods have shown great promise over sparse retrieval methods in a range of NLP problems. Among them, dense phrase retrieval-the most fine-grained retrieval unit-is appealing because phrases can be directly used as the output for question answering and slot filling tasks. In this work, we follow the intuition that retrieving phrases naturally entails retrieving larger text blocks and study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents. We first observe that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy (+3-5% in top-5 accuracy) compared to passage retrievers, which also helps achieve superior end-to-end QA performance with fewer passages. Then, we provide an interpretation for why phrase-level supervision helps learn better fine-grained entailment compared to passage-level supervision, and also show that phrase retrieval can be improved to achieve competitive performance in document-retrieval tasks such as entity linking and knowledge-grounded dialogue. Finally, we demonstrate how phrase filtering and vector quantization can reduce the size of our index by 4-10x, making dense phrase retrieval a practical and versatile solution in multi-granularity retrieval.

Related papers

ELITE: Embedding-Less retrieval with Iterative Text Exploration [5.8851517822935335]
Large Language Models (LLMs) have achieved impressive progress in natural language processing.<n>Their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks.
arXiv Detail & Related papers (2025-05-17T08:48:43Z)
Improve Dense Passage Retrieval with Entailment Tuning [22.39221206192245]
Key to a retrieval system is to calculate relevance scores to query and passage pairs. We observed that a major class of relevance aligns with the concept of entailment in NLI tasks. We design a method called entailment tuning to improve the embedding of dense retrievers.
arXiv Detail & Related papers (2024-10-21T09:18:30Z)
A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval [14.389703823471574]
We propose the use of a neural query rewriter to generate multiple queries and show how to integrate those queries in the passage retrieval pipeline efficiently. The main strength of our approach lies in its simplicity: it leverages how the beam search algorithm works and can produce multiple query rewrites at no additional cost.
arXiv Detail & Related papers (2024-06-27T07:43:03Z)
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus. The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z)
Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval. We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Lexically-Accelerated Dense Retrieval [29.327878974130055]
'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models. LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
arXiv Detail & Related papers (2023-07-31T15:44:26Z)
Bridging the Training-Inference Gap for Dense Phrase Retrieval [104.4836127502683]
Building dense retrievers requires a series of standard procedures, including training and validating neural models. In this paper, we explore how the gap between training and inference in dense retrieval can be reduced. We propose an efficient way of validating dense retrievers using a small subset of the entire corpus.
arXiv Detail & Related papers (2022-10-25T00:53:06Z)
LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval [68.85686621130111]
We propose to make a dense retriever align a well-performing lexicon-aware representation model. We evaluate our model on three public benchmarks, which shows that with a comparable lexicon-aware retriever as the teacher, our proposed dense model can bring consistent and significant improvements.
arXiv Detail & Related papers (2022-08-29T15:09:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.