Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences
- URL: http://arxiv.org/abs/2506.23743v2
- Date: Tue, 01 Jul 2025 10:39:07 GMT
- Title: Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences
- Authors: Tiziano Labruna, Simone Gallo, Giovanni Da San Martino,
- Abstract summary: We quantify and analyze positional bias across five large language models under varying degrees of answer uncertainty.<n>We observe that positional bias is nearly absent under low-uncertainty conditions, but grows exponentially when it becomes doubtful to decide which option is correct.
- Score: 8.389558188293641
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Positional bias in binary question answering occurs when a model systematically favors one choice over another based solely on the ordering of presented options. In this study, we quantify and analyze positional bias across five large language models under varying degrees of answer uncertainty. We re-adapted the SQuAD-it dataset by adding an extra incorrect answer option and then created multiple versions with progressively less context and more out-of-context answers, yielding datasets that range from low to high uncertainty. Additionally, we evaluate two naturally higher-uncertainty benchmarks: (1) WebGPT - question pairs with unequal human-assigned quality scores, and (2) Winning Arguments - where models predict the more persuasive argument in Reddit's r/ChangeMyView exchanges. Across each dataset, the order of the "correct" (or higher-quality/persuasive) option is systematically flipped (first placed in position 1, then in position 2) to compute both Preference Fairness and Position Consistency. We observe that positional bias is nearly absent under low-uncertainty conditions, but grows exponentially when it becomes doubtful to decide which option is correct.
Related papers
- SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models [0.27309692684728604]
Large Language Models (LLMs) can achieve inflated scores on multiple-choice tasks by exploiting inherent biases in option positions or labels.<n>This study introduces SCOPE, an evaluation framework designed to measure and mitigate such selection bias in a dataset-independent manner.
arXiv Detail & Related papers (2025-07-24T08:28:17Z) - Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models [2.3936613583728064]
We present the first comprehensive study of position biases across multiple large language models (LLMs)<n>We find strong and consistent order effects, including a quality-dependent shift.<n>We also identify two previously undocumented biases in both human and machine decision-making.
arXiv Detail & Related papers (2025-06-17T01:14:22Z) - Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models [49.46335932942725]
We study how positional bias interacts with model uncertainty, syntax, and prompting.<n>We present a cross-linguistic study across five typologically distinct languages.
arXiv Detail & Related papers (2025-05-22T02:23:00Z) - Mitigating Selection Bias with Node Pruning and Auxiliary Options [11.835002896308545]
Large language models (LLMs) often exhibit systematic preferences for certain answer choices when responding to multiple-choice questions.<n>This bias reduces the accuracy and reliability of LLM outputs, limiting their usefulness in decision-critical applications.<n>We introduce two methods: Bias Node Pruning (BNP), which prunes parameters that contribute to selection bias, and Auxiliary Option Injection (AOI), which introduces an answer choice to reduce bias in both white-box and black-box settings.
arXiv Detail & Related papers (2024-09-27T15:53:54Z) - Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z) - Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions.
We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding.
We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data.
We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z) - Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models.
Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance.
We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z) - Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs)
This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias"
We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z) - Realistic Conversational Question Answering with Answer Selection based
on Calibrated Confidence and Uncertainty Measurement [54.55643652781891]
Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times.
We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model.
We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
arXiv Detail & Related papers (2023-02-10T09:42:07Z) - BBQ: A Hand-Built Bias Benchmark for Question Answering [25.108222728383236]
It is well documented that NLP models learn social biases present in the world, but little work has been done to show how these biases manifest in actual model outputs for applied tasks like question answering (QA)
We introduce the Bias Benchmark for QA (BBQ), a dataset consisting of question-sets constructed by the authors that highlight textitattested social biases against people belonging to protected classes along nine different social dimensions relevant for U.S. English-speaking contexts.
We find that models strongly rely on stereotypes when the context is ambiguous, meaning that the model's outputs consistently reproduce harmful biases in this setting
arXiv Detail & Related papers (2021-10-15T16:43:46Z) - Greedy Gradient Ensemble for Robust Visual Question Answering [163.65789778416172]
We stress the language bias in Visual Question Answering (VQA) that comes from two aspects, i.e., distribution bias and shortcut bias.
We propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning.
GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models.
arXiv Detail & Related papers (2021-07-27T08:02:49Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Look at the First Sentence: Position Bias in Question Answering [21.532952595659705]
We study position bias in question answering models such as BiDAF and BERT.
We train models with various de-biasing methods including entropy regularization and bias ensembling.
We find that using the prior distribution of answer positions as a bias model is very effective at reducing position bias.
arXiv Detail & Related papers (2020-04-30T06:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.