Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
- URL: http://arxiv.org/abs/2503.05037v1
- Date: Thu, 06 Mar 2025 23:23:13 GMT
- Title: Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
- Authors: Mohsen Fayyaz, Ali Modarressi, Hinrich Schuetze, Nanyun Peng,
- Abstract summary: Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)<n>We show that retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches.<n>We show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs.
- Score: 56.09494651178128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robustness is critical to avoid failures. In this work, by repurposing a relation extraction dataset (e.g. Re-DocRED), we design controlled experiments to quantify the impact of heuristic biases, such as favoring shorter documents, in retrievers like Dragon+ and Contriever. Our findings reveal significant vulnerabilities: retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. Additionally, they tend to overlook whether the document contains the query's answer, lacking deep semantic understanding. Notably, when multiple biases combine, models exhibit catastrophic performance degradation, selecting the answer-containing document in less than 3% of cases over a biased document without the answer. Furthermore, we show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs, resulting in a 34% performance drop than not providing any documents at all.
Related papers
- Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER)
DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process.
Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z) - Drowning in Documents: Consequences of Scaling Reranker Inference [35.499018267073964]
Cross-encoders are often used to re-score the documents retrieved by cheaper initial IR systems.
We measure reranker performance for full retrieval, not just re-scoring first-stage retrieval.
Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents.
arXiv Detail & Related papers (2024-11-18T17:46:32Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Natural Logic-guided Autoregressive Multi-hop Document Retrieval for
Fact Verification [21.04611844009438]
We propose a novel retrieve-and-rerank method for multi-hop retrieval.
It consists of a retriever that jointly scores documents in the knowledge source and sentences from previously retrieved documents.
It is guided by a proof system that dynamically terminates the retrieval process if the evidence is deemed sufficient.
arXiv Detail & Related papers (2022-12-10T11:32:38Z) - Revisiting DocRED -- Addressing the False Negative Problem in Relation
Extraction [39.78594332093083]
We re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED.
We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points.
arXiv Detail & Related papers (2022-05-25T11:54:48Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.