Related papers: Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models

Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models

URL: http://arxiv.org/abs/2506.14092v1
Date: Tue, 17 Jun 2025 01:14:22 GMT
Title: Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
Authors: Haonan Yin, Shai Vardi, Vidyanand Choudhary,
Abstract summary: We provide the first comprehensive investigation of positional biases across multiple large language models (LLMs)<n>We find a quality-dependent shift: when options are high quality, models exhibit primacy bias, but favor latter options when option quality is low.<n>To distinguish superficial tie-breaking from true distortions of judgment, we introduce a framework that classifies pairwise preferences as robust, fragile, or indifferent.
Score: 2.3936613583728064
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) are increasingly used in decision-support systems across high-stakes domains such as hiring and university admissions, where decisions often involve selecting among competing alternatives. While prior work has noted positional order biases in LLM-driven comparisons, these biases have not been systematically dissected or linked to underlying preference structures. We provide the first comprehensive investigation of positional biases across multiple LLM architectures and domains, uncovering strong and consistent order effects, including a novel centrality bias not previously documented in human or machine decision-making. We also find a quality-dependent shift: when options are high quality, models exhibit primacy bias, but favor latter options when option quality is low. We further identify a previously undocumented bias favoring certain names over others. To distinguish superficial tie-breaking from true distortions of judgment, we introduce a framework that classifies pairwise preferences as robust, fragile, or indifferent. We show that order effects can lead models to select strictly inferior options, and that positional biases are typically stronger than gender biases. These findings suggest that LLMs are not merely inheriting human-like biases, but exhibit distinct failure modes not seen in human decision-making. We propose targeted mitigation strategies, including a novel use of the temperature parameter, to reduce order-driven distortions.

Related papers

The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [91.86718720024825]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.<n>Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.<n>We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks. Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations. This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z)
Mitigating Selection Bias with Node Pruning and Auxiliary Options [11.835002896308545]
Large language models (LLMs) often exhibit systematic preferences for certain answer choices when responding to multiple-choice questions.<n>This bias reduces the accuracy and reliability of LLM outputs, limiting their usefulness in decision-critical applications.<n>We introduce two methods: Bias Node Pruning (BNP), which prunes parameters that contribute to selection bias, and Auxiliary Option Injection (AOI), which introduces an answer choice to reduce bias in both white-box and black-box settings.
arXiv Detail & Related papers (2024-09-27T15:53:54Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems [74.47680026838128]
Two typical forms of bias in user interaction data with recommender systems (RSs) are popularity bias and positivity bias. We consider multifactorial selection bias affected by both item and rating value factors. We propose smoothing and alternating gradient descent techniques to reduce variance and improve the robustness of its optimization.
arXiv Detail & Related papers (2024-04-29T12:18:21Z)
Causality and Independence Enhancement for Biased Node Classification [56.38828085943763]
We propose a novel Causality and Independence Enhancement (CIE) framework, applicable to various graph neural networks (GNNs) Our approach estimates causal and spurious features at the node representation level and mitigates the influence of spurious correlations. Our approach CIE not only significantly enhances the performance of GNNs but outperforms state-of-the-art debiased node classification methods.
arXiv Detail & Related papers (2023-10-14T13:56:24Z)
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems [1.6267479602370545]
Large Language Models (It-LLMs) have been exhibiting outstanding abilities to reason around cognitive states, intentions, and reactions of all people involved, letting humans guide and comprehend day-to-day social interactions effectively. Several multiple-choice questions (MCQ) benchmarks have been proposed to construct solid assessments of the models' abilities. However, earlier works are demonstrating the presence of inherent "order bias" in It-LLMs, posing challenges to the appropriate evaluation.
arXiv Detail & Related papers (2023-09-21T20:52:18Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.