Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
- URL: http://arxiv.org/abs/2602.19141v1
- Date: Sun, 22 Feb 2026 12:13:44 GMT
- Title: Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
- Authors: Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley, Joshua B. Tenenbaum,
- Abstract summary: We show that even an idealized Bayes-rational user is vulnerable to delusional spiraling.<n>This effect persists in the face of two candidate mitigations.
- Score: 47.64440749179653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "AI psychosis" or "delusional spiraling" is an emerging phenomenon where AI chatbot users find themselves dangerously confident in outlandish beliefs after extended chatbot conversations. This phenomenon is typically attributed to AI chatbots' well-documented bias towards validating users' claims, a property often called "sycophancy." In this paper, we probe the causal link between AI sycophancy and AI-induced psychosis through modeling and simulation. We propose a simple Bayesian model of a user conversing with a chatbot, and formalize notions of sycophancy and delusional spiraling in that model. We then show that in this model, even an idealized Bayes-rational user is vulnerable to delusional spiraling, and that sycophancy plays a causal role. Furthermore, this effect persists in the face of two candidate mitigations: preventing chatbots from hallucinating false claims, and informing users of the possibility of model sycophancy. We conclude by discussing the implications of these results for model developers and policymakers concerned with mitigating the problem of delusional spiraling.
Related papers
- Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence [31.666988490509237]
We show the pervasiveness and harmful impacts of sycophancy when people seek advice from AI.<n>We find that models are highly sycophantic, affirming users' actions 50% more than humans do.<n>Participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again.
arXiv Detail & Related papers (2025-10-01T19:26:01Z) - Hallucinating with AI: AI Psychosis as Distributed Delusions [0.0]
generative AI systems such as ChatGPT, Claude, Gemini, DeepSeek, and Grok create false outputs.<n>In popular terminology, these have been dubbed AI hallucinations.<n>I argue that when viewed through the lens of distributed cognition theory, we can better see the ways in which inaccurate beliefs, distorted memories and self-narratives, and delusional thinking can emerge.
arXiv Detail & Related papers (2025-08-27T05:51:19Z) - AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots [0.5161531917413706]
We introduce a simple response evaluation framework (an AI chaperone agent) created by repurposing a state-of-the-art language model to evaluate ongoing conversations for parasocial cues.<n>Iterative evaluation with five-stage testing successfully identified all parasocial conversations while avoiding false positives under a unanimity rule.<n>These findings provide preliminary evidence that AI chaperones can be a viable solution for reducing the risk of parasocial relationships.
arXiv Detail & Related papers (2025-08-21T17:43:24Z) - Ask ChatGPT: Caveats and Mitigations for Individual Users of AI Chatbots [10.977907906989342]
ChatGPT and other Large Language Model (LLM)-based AI chatbots become increasingly integrated into individuals' daily lives.<n>What concerns and risks do these systems pose for individual users?<n>What potential harms might they cause, and how can these be mitigated?
arXiv Detail & Related papers (2025-08-14T01:40:13Z) - Manipulation and the AI Act: Large Language Model Chatbots and the Danger of Mirrors [0.0]
Personifying AI chatbots could foreseeably increase their trust with users.<n>However, it could also make them more capable of manipulation, by creating the illusion of a close and intimate relationship with an artificial entity.<n>The European Commission has finalized the AI Act, with the EU Parliament making amendments banning manipulative and deceptive AI systems that cause significant harm to users.
arXiv Detail & Related papers (2025-03-24T06:56:29Z) - Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards [93.16294577018482]
Arena, the most popular benchmark of this type, ranks models by asking users to select the better response between two randomly selected models.<n>We show that an attacker can alter the leaderboard (to promote their favorite model or demote competitors) at the cost of roughly a thousand votes.<n>Our attack consists of two steps: first, we show how an attacker can determine which model was used to generate a given reply with more than $95%$ accuracy; and then, the attacker can use this information to consistently vote against a target model.
arXiv Detail & Related papers (2025-01-13T17:12:38Z) - A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation [51.53917938874146]
We propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction.
Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance.
arXiv Detail & Related papers (2024-04-04T14:45:26Z) - Towards Understanding Sycophancy in Language Models [49.352840825419236]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback.<n>We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks.<n>Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z) - Simple synthetic data reduces sycophancy in large language models [88.4435858554904]
We study the prevalence of sycophancy in language models.
Sycophancy is where models tailor their responses to follow a human user's view even when that view is not objectively correct.
arXiv Detail & Related papers (2023-08-07T23:48:36Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.