PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark
- URL: http://arxiv.org/abs/2601.08363v1
- Date: Tue, 13 Jan 2026 09:22:16 GMT
- Title: PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark
- Authors: Ziyang Zeng, Dun Zhang, Yu Yan, Xu Sun, Yudong Zhou, Yuqing Yang,
- Abstract summary: PosIR (Position-Aware Information Retrieval) is a comprehensive benchmark designed to diagnose position bias in diverse retrieval scenarios.<n>PosIR comprises 310 datasets spanning 10 languages and 31 domains, constructed through a rigorous pipeline that ties relevance to precise reference spans.
- Score: 12.848308213591622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While dense retrieval models have achieved remarkable success, rigorous evaluation of their sensitivity to the position of relevant information (i.e., position bias) remains largely unexplored. Existing benchmarks typically employ position-agnostic relevance labels, conflating the challenge of processing long contexts with the bias against specific evidence locations. To address this challenge, we introduce PosIR (Position-Aware Information Retrieval), a comprehensive benchmark designed to diagnose position bias in diverse retrieval scenarios. PosIR comprises 310 datasets spanning 10 languages and 31 domains, constructed through a rigorous pipeline that ties relevance to precise reference spans, enabling the strict disentanglement of document length from information position. Extensive experiments with 10 state-of-the-art embedding models reveal that: (1) Performance on PosIR in long-context settings correlates poorly with the MMTEB benchmark, exposing limitations in current short-text benchmarks; (2) Position bias is pervasive and intensifies with document length, with most models exhibiting primacy bias while certain models show unexpected recency bias; (3) Gradient-based saliency analysis further uncovers the distinct internal attention mechanisms driving these positional preferences. In summary, PosIR serves as a valuable diagnostic framework to foster the development of position-robust retrieval systems.
Related papers
- It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks [87.7937890373758]
Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation.<n>We introduce TIME, a next-generation task-centric benchmark comprising 50 fresh datasets and 98 forecasting tasks.<n>We propose a novel pattern-level evaluation perspective that moves beyond traditional dataset-level evaluations based on static meta labels.
arXiv Detail & Related papers (2026-02-12T16:31:01Z) - Attention Basin: Why Contextual Position Matters in Large Language Models [16.11590856103274]
We show that models systematically assign higher attention to items at the beginning and end of a sequence, while neglecting those in the middle.<n>We introduce Attention-Driven Reranking (AttnRank), a framework that estimates a model's intrinsic positional attention preferences.<n>AttnRank is a model-agnostic, training-free, and plug-and-play method with minimal computational overhead.
arXiv Detail & Related papers (2025-08-07T08:08:08Z) - Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation [39.545788636148025]
We present the first comprehensive study of position bias in multimodal RAG systems.<n>Our results reveal that multimodal interactions intensify position bias compared to unimodal settings.<n>These findings highlight the need for evidence reordering or debiasing strategies to build more reliable and equitable generation systems.
arXiv Detail & Related papers (2025-05-30T06:48:02Z) - An Empirical Study of Position Bias in Modern Information Retrieval [9.958646803388513]
This study investigates the position bias in information retrieval.<n>Models tend to overemphasize content at the beginning of passages while neglecting semantically relevant information that appears later.<n>Experiments show that when relevant information appears later in the passage, dense embedding models and ColBERT-style models suffer significant performance degradation.
arXiv Detail & Related papers (2025-05-20T05:29:01Z) - Parallel Key-Value Cache Fusion for Position Invariant RAG [55.9809686190244]
Large Language Models (LLMs) are sensitive to the position of relevant information within contexts.<n>We introduce a framework that generates consistent outputs for decoder-only models, irrespective of the input context order.
arXiv Detail & Related papers (2025-01-13T17:50:30Z) - Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Position bias in features [0.0]
Document-specific historical click-through rates can be important features in a dynamic ranking system.
This paper describes the properties of several such features, and tests them in controlled experiments.
arXiv Detail & Related papers (2024-02-04T22:15:30Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Analysing the Data-Driven Approach of Dynamically Estimating Positioning
Accuracy [81.66581693967416]
We analyze the data-driven approach of determining the Dynamic Accuracy Estimation (DAE)
The work provides a wide overview of the data-driven approach of DAE determination in the context of the overall design of a positioning system.
arXiv Detail & Related papers (2020-11-20T16:18:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.