Drowning in Documents: Consequences of Scaling Reranker Inference
- URL: http://arxiv.org/abs/2411.11767v1
- Date: Mon, 18 Nov 2024 17:46:32 GMT
- Title: Drowning in Documents: Consequences of Scaling Reranker Inference
- Authors: Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin, Omar Khattab, Andrew Drozdov,
- Abstract summary: Cross-encoders are often used to re-score the documents retrieved by cheaper initial IR systems.
We measure reranker performance for full retrieval, not just re-scoring first-stage retrieval.
Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents.
- Score: 35.499018267073964
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be more effective. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents and actually degrade quality beyond a certain limit. In fact, in this setting, rerankers can frequently assign high scores to documents with no lexical or semantic overlap with the query. We hope that our findings will spur future research to improve reranking.
Related papers
- Breaking the Lens of the Telescope: Online Relevance Estimation over Large Retrieval Sets [15.549852480638066]
We propose a novel paradigm for re-ranking called online relevance estimation.
Online relevance estimation continuously updates relevance estimates for a query throughout the ranking process.
We validate our approach on TREC benchmarks under two scenarios: hybrid retrieval and adaptive retrieval.
arXiv Detail & Related papers (2025-04-12T22:05:50Z) - Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)
We show that retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches.
We show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs.
arXiv Detail & Related papers (2025-03-06T23:23:13Z) - Counterfactual Query Rewriting to Use Historical Relevance Feedback [25.893083499927776]
We propose approaches to rewrite user queries and compare them against a system that directly uses the previous qrels for the ranking.
We expand queries with terms extracted from the previously relevant documents or derive so-called keyqueries that rank the previously relevant documents to the top of the current corpus.
Our evaluation in the CLEF LongEval scenario shows that rewriting queries with historical relevance feedback improves the retrieval effectiveness and even outperforms computationally expensive transformer-based approaches.
arXiv Detail & Related papers (2025-02-06T09:05:41Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models [34.39053202801489]
In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts.
We propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results.
Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.
arXiv Detail & Related papers (2024-08-30T07:57:30Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - Can We Use Large Language Models to Fill Relevance Judgment Holes? [9.208308067952155]
We take initial steps towards extending existing test collections by employing Large Language Models (LLM) to fill the holes.
We find substantially lower correlates when human plus automatic judgments are used.
arXiv Detail & Related papers (2024-05-09T07:39:19Z) - Lexically-Accelerated Dense Retrieval [29.327878974130055]
'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models.
LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
arXiv Detail & Related papers (2023-07-31T15:44:26Z) - ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval.
We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z) - Fine-Grained Distillation for Long Document Retrieval [86.39802110609062]
Long document retrieval aims to fetch query-relevant documents from a large-scale collection.
Knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder.
We propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers.
arXiv Detail & Related papers (2022-12-20T17:00:36Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.