Related papers: An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations

An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations

URL: http://arxiv.org/abs/2505.15392v1
Date: Wed, 21 May 2025 11:33:54 GMT
Title: An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations
Authors: Yiming Huang, Biquan Bie, Zuqiu Na, Weilin Ruan, Songxin Lei, Yutao Yue, Xinlei He,
Abstract summary: We investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments.<n>To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors.<n>Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies.
Score: 12.481311145515706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments. We explore whether LLMs are affected by anchoring, the underlying mechanisms, and potential mitigation strategies. To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors. Combining refined evaluation metrics, we benchmark current widely used LLMs. Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies, while reasoning can offer some mitigation. This recontextualization via cognitive psychology urges that LLM evaluations focus not on standard benchmarks or over-optimized robustness tests, but on cognitive-bias-aware trustworthy evaluation.

Related papers

Investigating the Effects of Cognitive Biases in Prompts on Large Language Model Outputs [3.7302076138352205]
This paper investigates the influence of cognitive biases on Large Language Models (LLMs) outputs.<n> cognitive biases, such as confirmation and availability biases, can distort user inputs through prompts.
arXiv Detail & Related papers (2025-06-14T04:18:34Z)
Cognitive Debiasing Large Language Models for Decision-Making [71.2409973056137]
Large language models (LLMs) have shown potential in supporting decision-making applications.<n>We propose a cognitive debiasing approach, called self-debiasing, that enhances the reliability of LLMs.<n>Our method follows three sequential steps -- bias determination, bias analysis, and cognitive debiasing -- to iteratively mitigate potential cognitive biases in prompts.
arXiv Detail & Related papers (2025-04-05T11:23:05Z)
Anchoring Bias in Large Language Models: An Experimental Study [5.229564709919574]
Large Language Models (LLMs) like GPT-4 and Gemini have significantly advanced artificial intelligence.<n>This study delves into anchoring bias, a cognitive bias where initial information disproportionately influences judgment.
arXiv Detail & Related papers (2024-12-09T15:45:03Z)
CBEval: A framework for evaluating and interpreting cognitive biases in LLMs [1.4633779950109127]
Large Language models exhibit notable gaps in their cognitive processes.<n>As reflections of human-generated data, these models have the potential to inherit cognitive biases.
arXiv Detail & Related papers (2024-12-04T05:53:28Z)
Cognitive Biases in Large Language Models for News Recommendation [68.90354828533535]
This paper explores the potential impact of cognitive biases on large language models (LLMs) based news recommender systems. We discuss strategies to mitigate these biases through data augmentation, prompt engineering and learning algorithms aspects.
arXiv Detail & Related papers (2024-10-03T18:42:07Z)
AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment [37.985947029716016]
Large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data. We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments.
arXiv Detail & Related papers (2024-09-24T12:23:15Z)
Metacognitive Myopia in Large Language Models [0.0]
Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally inherent stereotypes, cloud moral judgments, or amplify positive evaluations of majority groups. We propose metacognitive myopia as a cognitive-ecological framework that can account for a conglomerate of established and emerging LLM biases. Our theoretical framework posits that a lack of the two components of metacognition, monitoring and control, causes five symptoms of metacognitive myopia in LLMs.
arXiv Detail & Related papers (2024-08-10T14:43:57Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.<n>Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.<n>Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs [6.090496490133132]
We propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning.
arXiv Detail & Related papers (2024-04-15T22:18:50Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.<n>In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.<n>We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.<n>These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation [60.65820977963331]
We introduce a novel evaluation paradigm for Large Language Models (LLMs) This paradigm shifts the emphasis from result-oriented assessments, which often neglect the reasoning process, to a more comprehensive evaluation. By applying this paradigm in the GSM8K dataset, we have developed the MR-GSM8K benchmark.
arXiv Detail & Related papers (2023-12-28T15:49:43Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.