AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment
- URL: http://arxiv.org/abs/2409.16022v2
- Date: Tue, 8 Oct 2024 10:23:00 GMT
- Title: AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment
- Authors: Nuo Chen, Jiqun Liu, Xiaoyu Dong, Qijiong Liu, Tetsuya Sakai, Xiao-Ming Wu,
- Abstract summary: Large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data.
We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments.
- Score: 37.985947029716016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cognitive biases are systematic deviations in thinking that lead to irrational judgments and problematic decision-making, extensively studied across various fields. Recently, large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data. While social biases in LLMs have been well-studied, cognitive biases have received less attention, with existing research focusing on specific scenarios. The broader impact of cognitive biases on LLMs in various decision-making contexts remains underexplored. We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments, a core task and widely-discussed research topic in the Information Retrieval (IR) coummunity. The priming effect occurs when exposure to certain stimuli unconsciously affects subsequent behavior and decisions. Our experiment employed 10 topics from the TREC 2019 Deep Learning passage track collection, and tested AI judgments under different document relevance scores, batch lengths, and LLM models, including GPT-3.5, GPT-4, LLaMa2-13B and LLaMa2-70B. Results showed that LLMs tend to give lower scores to later documents if earlier ones have high relevance, and vice versa, regardless of the combination and model used. Our finding demonstrates that LLM%u2019s judgments, similar to human judgments, are also influenced by threshold priming biases, and suggests that researchers and system engineers should take into account potential human-like cognitive biases in designing, evaluating, and auditing LLMs in IR tasks and beyond.
Related papers
- Anchoring Bias in Large Language Models: An Experimental Study [5.229564709919574]
Large Language Models (LLMs) like GPT-4 and Gemini have significantly advanced artificial intelligence.
This study delves into anchoring bias, a cognitive bias where initial information disproportionately influences judgment.
arXiv Detail & Related papers (2024-12-09T15:45:03Z) - Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments [24.15688619889342]
Large Language Models (LLMs) are trained on large corpora written by humans and demonstrate high performance on various tasks.
As humans are susceptible to cognitive biases, LLMs can also be influenced by these biases, leading to irrational decision-making.
arXiv Detail & Related papers (2024-11-30T02:37:59Z) - The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges [4.65004369765875]
It is not clear to what extent large language models (LLMs) behave "rationally"
Our study empirically confirms the cognitive bias risks embedded in LLM agents.
It highlights the complexity and importance of debiasing AI agents.
arXiv Detail & Related papers (2024-11-23T00:43:27Z) - Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes.
We find that the majority of disagreements are in opposition with standard reward modeling approaches.
We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z) - The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - Large Language Models are Biased Reinforcement Learners [0.0]
We show that large language models (LLMs) exhibit behavioral signatures of a relative value bias.
Computational cognitive modeling reveals that LLM behavior is well-described by a simple RL algorithm.
arXiv Detail & Related papers (2024-05-19T01:43:52Z) - Explaining Large Language Models Decisions Using Shapley Values [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes.
However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain.
This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.