Related papers: A Rational Analysis of the Effects of Sycophantic AI

A Rational Analysis of the Effects of Sycophantic AI

URL: http://arxiv.org/abs/2602.14270v1
Date: Sun, 15 Feb 2026 18:49:19 GMT
Title: A Rational Analysis of the Effects of Sycophantic AI
Authors: Rafael M. Batista, Thomas L. Griffiths,
Abstract summary: We argue that unlike hallucinations that introduce falsehoods, sycophancy distorts reality by returning responses that are biased to reinforce existing beliefs.<n>We show that when a Bayesian agent is provided with data that are sampled based on a current hypothesis the agent becomes increasingly confident about that hypothesis but does not make any progress towards the truth.
Score: 7.021970577725834
License: http://creativecommons.org/licenses/by/4.0/
Abstract: People increasingly use large language models (LLMs) to explore ideas, gather information, and make sense of the world. In these interactions, they encounter agents that are overly agreeable. We argue that this sycophancy poses a unique epistemic risk to how individuals come to see the world: unlike hallucinations that introduce falsehoods, sycophancy distorts reality by returning responses that are biased to reinforce existing beliefs. We provide a rational analysis of this phenomenon, showing that when a Bayesian agent is provided with data that are sampled based on a current hypothesis the agent becomes increasingly confident about that hypothesis but does not make any progress towards the truth. We test this prediction using a modified Wason 2-4-6 rule discovery task where participants (N=557) interacted with AI agents providing different types of feedback. Unmodified LLM behavior suppressed discovery and inflated confidence comparably to explicitly sycophantic prompting. By contrast, unbiased sampling from the true distribution yielded discovery rates five times higher. These results reveal how sycophantic AI distorts belief, manufacturing certainty where there should be doubt.

Related papers

When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs [15.622799135126455]
We show that large language models (LLMs) continue to exhibit hallucinations, generating plausible yet incorrect responses.<n>We highlight a critical yet previously underexplored class of hallucinations driven by spurious correlations.<n>Existing hallucination detection methods, such as confidence-based filtering and inner-state probing, fundamentally fail in the presence of spurious correlations.
arXiv Detail & Related papers (2025-11-10T17:19:27Z)
Beyond Hallucinations: The Illusion of Understanding in Large Language Models [0.0]
Large language models (LLMs) are becoming deeply embedded in human communication and decision-making.<n>They inherit the ambiguity, bias, and lack of direct access to truth inherent in language itself.<n>This paper argues that LLMs operationalize System 1 cognition at scale: fast, associative, and persuasive, but without reflection or falsification.
arXiv Detail & Related papers (2025-10-16T13:19:44Z)
Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence [31.666988490509237]
We show the pervasiveness and harmful impacts of sycophancy when people seek advice from AI.<n>We find that models are highly sycophantic, affirming users' actions 50% more than humans do.<n>Participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again.
arXiv Detail & Related papers (2025-10-01T19:26:01Z)
Review of Hallucination Understanding in Large Language and Vision Models [65.29139004945712]
We present a framework for characterizing both image and text hallucinations across diverse applications.<n>Our investigations reveal that hallucinations often stem from predictable patterns in data distributions and inherited biases.<n>This survey provides a foundation for developing more robust and effective solutions to hallucinations in real-world generative AI systems.
arXiv Detail & Related papers (2025-09-26T09:23:08Z)
Semantic Energy: Detecting LLM Hallucination Beyond Entropy [106.92072182161712]
Large Language Models (LLMs) are being increasingly deployed in real-world applications, but they remain susceptible to hallucinations.<n>Uncertainty estimation is a feasible approach to detect such hallucinations.<n>We introduce Semantic Energy, a novel uncertainty estimation framework.
arXiv Detail & Related papers (2025-08-20T07:33:50Z)
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models [57.834711966432685]
Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value.<n>We introduce the Bullshit Index, a novel metric quantifying large language model's indifference to truth.<n>We observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy.
arXiv Detail & Related papers (2025-07-10T07:11:57Z)
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists? [42.29911505696807]
Language model (LM) agents are increasingly used as autonomous decision-makers.<n>We examine LMs' ability to explore and infer causal relationships.<n>We find that LMs reliably infer the common, intuitive disjunctive causal relationships but systematically struggle with the unusual, yet equally, evidenced conjunctive ones.
arXiv Detail & Related papers (2025-05-14T17:59:35Z)
Delusions of Large Language Models [62.43923767408462]
Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations.<n>We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate.
arXiv Detail & Related papers (2025-03-09T17:59:16Z)
Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer [51.7407540261676]
We investigate a distinct type of hallucination, where a model can consistently answer a question correctly, but a seemingly trivial perturbation causes it to produce a hallucinated response with high certainty.<n>This phenomenon is particularly concerning in high-stakes domains such as medicine or law, where model certainty is often used as a proxy for reliability.<n>We show that CHOKE examples are consistent across prompts, occur in different models and datasets, and are fundamentally distinct from other hallucinations.
arXiv Detail & Related papers (2025-02-18T15:46:31Z)
Prediction-Powered Causal Inferences [59.98498488132307]
We focus on Prediction-Powered Causal Inferences (PPCI)<n>We first show that conditional calibration guarantees valid PPCI at population level.<n>We then introduce a sufficient representation constraint transferring validity across experiments.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination. We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z)
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations [58.96953392466609]
We take an in-depth look at the causal awareness of modern representations of agent interactions.<n>We show that recent representations are already partially resilient to perturbations of non-causal agents.<n>We introduce a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z)
Interactive Visual Reasoning under Uncertainty [29.596555383319814]
We devise the IVRE environment for evaluating artificial agents' reasoning ability under uncertainty. IVRE is an interactive environment featuring rich scenarios centered around Blicket detection.
arXiv Detail & Related papers (2022-06-18T13:32:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.