Related papers: Positional Bias in Multimodal Embedding Models: Do They Favor the Beginning, the Middle, or the End?

Positional Bias in Multimodal Embedding Models: Do They Favor the Beginning, the Middle, or the End?

URL: http://arxiv.org/abs/2511.11216v1
Date: Fri, 14 Nov 2025 12:15:46 GMT
Title: Positional Bias in Multimodal Embedding Models: Do They Favor the Beginning, the Middle, or the End?
Authors: Kebin Wu, Fatima Albreiki,
Abstract summary: We investigate positional bias in multimodal representation models, specifically in the context of image-text retrieval.<n>Our experiments demonstrate that positional bias is prevalent in multimodal models, but manifests differently across modalities.<n>We find that this bias arises from, or is amplified by, a combination of factors, including the positional encoding scheme, training loss, context importance, and the nature of using image-text pairs in multimodal training.
Score: 5.449094110831793
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Positional bias - where models overemphasize certain positions regardless of content - has been shown to negatively impact model performance across various tasks. While recent research has extensively examined positional bias in text generation models, its presence and effects in representation models remain underexplored. Even less is known about such biases in multimodal models. In this work, we investigate positional bias in multimodal representation models, specifically in the context of image-text retrieval. We begin by distinguishing between context importance and positional bias, and then assess the presence and extent of positional bias across different models and datasets. Our experiments demonstrate that positional bias is prevalent in multimodal models, but manifests differently across modalities: text encoders tend to exhibit bias toward the beginning of the input, whereas image encoders show bias at both the beginning and end. Furthermore, we find that this bias arises from, or is amplified by, a combination of factors, including the positional encoding scheme, training loss, context importance, and the nature of using image-text pairs in multimodal training.

Related papers

BLADE: Bias-Linked Adaptive DEbiasing [2.7352017408152083]
BLADE is a generative debiasing framework that requires no prior knowledge of bias or bias-conflicting samples.<n>We evaluate BLADE on multiple benchmark datasets and show that it significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-10-05T12:28:54Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [45.41676783204022]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs [50.07451351559251]
We present a study across five typologically distinct languages (English, Russian, German, Hindi, and Vietnamese)<n>We examine how position bias interacts with prompt strategies and affects output entropy.
arXiv Detail & Related papers (2025-05-22T02:23:00Z)
An Empirical Study of Position Bias in Modern Information Retrieval [9.958646803388513]
This study investigates the position bias in information retrieval.<n>Models tend to overemphasize content at the beginning of passages while neglecting semantically relevant information that appears later.<n>Experiments show that when relevant information appears later in the passage, dense embedding models and ColBERT-style models suffer significant performance degradation.
arXiv Detail & Related papers (2025-05-20T05:29:01Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z)
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective [13.486497323758226]
Vision-language models pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with objects or scenarios.<n>We propose a framework that incorporates causal mediation analysis to measure and map the pathways of bias generation and propagation.
arXiv Detail & Related papers (2024-07-03T05:19:45Z)
MIST: Mitigating Intersectional Bias with Disentangled Cross-Attention Editing in Text-to-Image Diffusion Models [3.3454373538792552]
We introduce a method that addresses intersectional bias in diffusion-based text-to-image models by modifying cross-attention maps in a disentangled manner. Our approach utilizes a pre-trained Stable Diffusion model, eliminates the need for an additional set of reference images, and preserves the original quality for unaltered concepts.
arXiv Detail & Related papers (2024-03-28T17:54:38Z)
Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias [13.828653029379257]
We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.
arXiv Detail & Related papers (2024-01-03T21:38:40Z)
Current Topological and Machine Learning Applications for Bias Detection in Text [4.799066966918178]
This study utilizes the RedditBias database to analyze textual biases. Four transformer models, including BERT and RoBERTa variants, were explored. Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
arXiv Detail & Related papers (2023-11-22T16:12:42Z)
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips. We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models [86.79402670904338]
We evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions. We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.
arXiv Detail & Related papers (2021-08-14T16:49:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.