Related papers: PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark

PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark

URL: http://arxiv.org/abs/2601.08363v1
Date: Tue, 13 Jan 2026 09:22:16 GMT
Title: PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark
Authors: Ziyang Zeng, Dun Zhang, Yu Yan, Xu Sun, Yudong Zhou, Yuqing Yang,
Abstract summary: PosIR (Position-Aware Information Retrieval) is a comprehensive benchmark designed to diagnose position bias in diverse retrieval scenarios.<n>PosIR comprises 310 datasets spanning 10 languages and 31 domains, constructed through a rigorous pipeline that ties relevance to precise reference spans.
Score: 12.848308213591622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While dense retrieval models have achieved remarkable success, rigorous evaluation of their sensitivity to the position of relevant information (i.e., position bias) remains largely unexplored. Existing benchmarks typically employ position-agnostic relevance labels, conflating the challenge of processing long contexts with the bias against specific evidence locations. To address this challenge, we introduce PosIR (Position-Aware Information Retrieval), a comprehensive benchmark designed to diagnose position bias in diverse retrieval scenarios. PosIR comprises 310 datasets spanning 10 languages and 31 domains, constructed through a rigorous pipeline that ties relevance to precise reference spans, enabling the strict disentanglement of document length from information position. Extensive experiments with 10 state-of-the-art embedding models reveal that: (1) Performance on PosIR in long-context settings correlates poorly with the MMTEB benchmark, exposing limitations in current short-text benchmarks; (2) Position bias is pervasive and intensifies with document length, with most models exhibiting primacy bias while certain models show unexpected recency bias; (3) Gradient-based saliency analysis further uncovers the distinct internal attention mechanisms driving these positional preferences. In summary, PosIR serves as a valuable diagnostic framework to foster the development of position-robust retrieval systems.

Related papers

It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks [87.7937890373758]
Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation.<n>We introduce TIME, a next-generation task-centric benchmark comprising 50 fresh datasets and 98 forecasting tasks.<n>We propose a novel pattern-level evaluation perspective that moves beyond traditional dataset-level evaluations based on static meta labels.
arXiv Detail & Related papers (2026-02-12T16:31:01Z)
Attention Basin: Why Contextual Position Matters in Large Language Models [16.11590856103274]
We show that models systematically assign higher attention to items at the beginning and end of a sequence, while neglecting those in the middle.<n>We introduce Attention-Driven Reranking (AttnRank), a framework that estimates a model's intrinsic positional attention preferences.<n>AttnRank is a model-agnostic, training-free, and plug-and-play method with minimal computational overhead.
arXiv Detail & Related papers (2025-08-07T08:08:08Z)
Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation [39.545788636148025]
We present the first comprehensive study of position bias in multimodal RAG systems.<n>Our results reveal that multimodal interactions intensify position bias compared to unimodal settings.<n>These findings highlight the need for evidence reordering or debiasing strategies to build more reliable and equitable generation systems.
arXiv Detail & Related papers (2025-05-30T06:48:02Z)
An Empirical Study of Position Bias in Modern Information Retrieval [9.958646803388513]
This study investigates the position bias in information retrieval.<n>Models tend to overemphasize content at the beginning of passages while neglecting semantically relevant information that appears later.<n>Experiments show that when relevant information appears later in the passage, dense embedding models and ColBERT-style models suffer significant performance degradation.
arXiv Detail & Related papers (2025-05-20T05:29:01Z)
Parallel Key-Value Cache Fusion for Position Invariant RAG [55.9809686190244]
Large Language Models (LLMs) are sensitive to the position of relevant information within contexts.<n>We introduce a framework that generates consistent outputs for decoder-only models, irrespective of the input context order.
arXiv Detail & Related papers (2025-01-13T17:50:30Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Position bias in features [0.0]
Document-specific historical click-through rates can be important features in a dynamic ranking system. This paper describes the properties of several such features, and tests them in controlled experiments.
arXiv Detail & Related papers (2024-02-04T22:15:30Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Analysing the Data-Driven Approach of Dynamically Estimating Positioning Accuracy [81.66581693967416]
We analyze the data-driven approach of determining the Dynamic Accuracy Estimation (DAE) The work provides a wide overview of the data-driven approach of DAE determination in the context of the overall design of a positioning system.
arXiv Detail & Related papers (2020-11-20T16:18:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.