Related papers: ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge

ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge

URL: http://arxiv.org/abs/2506.14407v2
Date: Tue, 15 Jul 2025 13:16:23 GMT
Title: ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge
Authors: Zeinab Sadat Taghavi, Ali Modarressi, Yunpu Ma, Hinrich Schütze,
Abstract summary: ImpliRet is a benchmark that shifts the reasoning challenge to document-side processing.<n>We evaluate a range of sparse and dense retrievers, all of which struggle in this setting.
Score: 49.65993318863458
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval systems are central to many NLP pipelines, but often rely on surface-level cues such as keyword overlap and lexical semantic similarity. To evaluate retrieval beyond these shallow signals, recent benchmarks introduce reasoning-heavy queries; however, they primarily shift the burden to query-side processing techniques -- like prompting or multi-hop retrieval -- that can help resolve complexity. In contrast, we present ImpliRet, a benchmark that shifts the reasoning challenge to document-side processing: The queries are simple, but relevance depends on facts stated implicitly in documents through temporal (e.g., resolving "two days ago"), arithmetic, and world knowledge relationships. We evaluate a range of sparse and dense retrievers, all of which struggle in this setting: the best nDCG@10 is only 14.91%. We also test whether long-context models can overcome this limitation. But even with a short context of only thirty documents, including the positive document, GPT-o4-mini scores only 55.54%, showing that document-side reasoning remains a challenge. Our codes are available at: github.com/ZeinabTaghavi/IMPLIRET

Related papers

Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z)
Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries [36.93438185371322]
Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities.<n>We propose a neuro-symbolic information retrieval method, namely textbfNS-IR, to optimize the embeddings of naive natural language.<n>Our experiments demonstrate that NS-IR achieves superior zero-shot retrieval performance on web search and low-resource retrieval tasks.
arXiv Detail & Related papers (2025-05-28T12:37:09Z)
Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents [17.506934704019226]
standardized documents share similar formats such as repetitive boilerplate texts, and similar table structures.<n>This similarity forces traditional RAG methods to misidentify near-duplicate text, leading to duplicate retrieval that undermines accuracy and completeness.<n>We propose the Hierarchical Retrieval with Evidence Curation framework to address these issues.
arXiv Detail & Related papers (2025-05-26T11:08:23Z)
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)<n>We quantify the impact of biases, such as a preference for shorter documents, on retrievers like Dragon+ and Contriever.<n>We uncover major vulnerabilities, showing retrievers favor shorter documents, early positions, repeated entities, and literal matches, all while ignoring the answer's presence!
arXiv Detail & Related papers (2025-03-06T23:23:13Z)
Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER)<n>DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process.<n> Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs [23.960451986662996]
This paper proposes a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought reasoning.<n>We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text.
arXiv Detail & Related papers (2025-02-18T02:49:40Z)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.<n>Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.<n>We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z)
DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR) While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context. Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.