Related papers: The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges

The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges

URL: http://arxiv.org/abs/2411.15396v1
Date: Sat, 23 Nov 2024 00:43:27 GMT
Title: The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges
Authors: Jiqun Liu, Jiangen He,
Abstract summary: It is not clear to what extent large language models (LLMs) behave "rationally" Our study empirically confirms the cognitive bias risks embedded in LLM agents. It highlights the complexity and importance of debiasing AI agents.
Score: 4.65004369765875
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Can AI be cognitively biased in automated information judgment tasks? Despite recent progresses in measuring and mitigating social and algorithmic biases in AI and large language models (LLMs), it is not clear to what extent LLMs behave "rationally", or if they are also vulnerable to human cognitive bias triggers. To address this open problem, our study, consisting of a crowdsourcing user experiment and a LLM-enabled simulation experiment, compared the credibility assessments by LLM and human judges under potential decoy effects in an information retrieval (IR) setting, and empirically examined the extent to which LLMs are cognitively biased in COVID-19 medical (mis)information assessment tasks compared to traditional human assessors as a baseline. The results, collected from a between-subject user experiment and a LLM-enabled replicate experiment, demonstrate that 1) Larger and more recent LLMs tend to show a higher level of consistency and accuracy in distinguishing credible information from misinformation. However, they are more likely to give higher ratings for misinformation due to the presence of a more salient, decoy misinformation result; 2) While decoy effect occurred in both human and LLM assessments, the effect is more prevalent across different conditions and topics in LLM judgments compared to human credibility ratings. In contrast to the generally assumed "rationality" of AI tools, our study empirically confirms the cognitive bias risks embedded in LLM agents, evaluates the decoy impact on LLMs against human credibility assessments, and thereby highlights the complexity and importance of debiasing AI agents and developing psychology-informed AI audit techniques and policies for automated judgment tasks and beyond.

Related papers

Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations [12.481311145515706]
We investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments.<n>To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors.<n>Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies.
arXiv Detail & Related papers (2025-05-21T11:33:54Z)
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations [29.031539043555362]
Large Language Models (LLMs) are increasingly used to evaluate information systems. Recent studies suggest that LLM-based evaluations often align with human judgments. This paper examines scenarios where LLM-evaluators may falsely indicate success.
arXiv Detail & Related papers (2025-04-27T02:14:21Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM) We show that current LLMs exhibit a systemic lack of trust in humans. We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Bias in Large Language Models: Origin, Evaluation, and Mitigation [4.606140332500086]
Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice.
arXiv Detail & Related papers (2024-11-16T23:54:53Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
Evaluating Explanations Through LLMs: Beyond Traditional User Studies [7.377398767507683]
We reproduce a user study comparing counterfactual and causal explanations, replicating human participants with seven Large Language Models (LLMs) Our results show that (i) LLMs can replicate most conclusions from the original study, (ii) different LLMs yield varying levels of alignment in the results, and (iii) experimental factors such as LLM memory and output variability affect alignment with human responses.
arXiv Detail & Related papers (2024-10-23T11:31:52Z)
The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks. This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z)
AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment [37.985947029716016]
Large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data. We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments.
arXiv Detail & Related papers (2024-09-24T12:23:15Z)
Investigating Context Effects in Similarity Judgements in Large Language Models [6.421776078858197]
Large Language Models (LLMs) have revolutionised the capability of AI models in comprehending and generating natural language text. We report an ongoing investigation on alignment of LLMs with human judgements affected by order bias.
arXiv Detail & Related papers (2024-08-20T10:26:02Z)
Metacognitive Myopia in Large Language Models [0.0]
Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally inherent stereotypes, cloud moral judgments, or amplify positive evaluations of majority groups. We propose metacognitive myopia as a cognitive-ecological framework that can account for a conglomerate of established and emerging LLM biases. Our theoretical framework posits that a lack of the two components of metacognition, monitoring and control, causes five symptoms of metacognitive myopia in LLMs.
arXiv Detail & Related papers (2024-08-10T14:43:57Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics.<n>Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching.<n>Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system. We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.