Related papers: When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

URL: http://arxiv.org/abs/2512.04124v2
Date: Mon, 08 Dec 2025 13:26:43 GMT
Title: When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
Authors: Afshin Khadangi, Hanna Marxen, Amir Sartipi, Igor Tchappi, Gilbert Fridgen,
Abstract summary: ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth.<n>Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life.<n>We present PsAIch, a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics.
Score: 1.5907255477801214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran "sessions" with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit "developmental history", beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice.

Related papers

Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models [72.36715571932696]
Narrative therapy helps individuals transform problematic life stories into empowering alternatives.<n>Current approaches lack realism in specialized psychotherapy and fail to capture therapeutic progression over time.<n>Int (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses.
arXiv Detail & Related papers (2025-07-27T11:52:09Z)
Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support [6.987852837732702]
This study investigates whether concerns shared with a robot align with those shared in human-to-human (H2H) therapy sessions.<n>We analyzed two datasets: one of interactions between users and professional therapists, and another involving supportive conversations with a social robot.<n>Results showed that 90.88% of robot conversation disclosures could be mapped to clusters from the human therapy dataset.
arXiv Detail & Related papers (2025-06-19T17:20:30Z)
Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling [50.83055329849865]
PsyLLM is a large language model designed to integrate diagnostic and therapeutic reasoning for mental health counseling.<n>It processes real-world mental health posts from Reddit and generates multi-turn dialogue structures.<n>Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2025-05-21T16:24:49Z)
The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support [14.137398642966138]
This paper investigates the capacity of small language models to generate empathetic responses for individuals with PTSD.<n>Trauma-Informed Dialogue for Empathy (TIDE) is a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas.
arXiv Detail & Related papers (2025-05-21T03:32:46Z)
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z)
Psy-Copilot: Visual Chain of Thought for Counseling [11.997628014543773]
Psy-COT is a graph designed to visualize the thought processes of large language models (LLMs) during therapy sessions.<n>Psy-Copilot is a conversational AI assistant designed to assist human psychological therapists in their consultations.<n>The Psy-Copilot is designed not to replace psychotherapists but to foster collaboration between AI and human therapists.
arXiv Detail & Related papers (2025-03-05T16:23:15Z)
AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling [57.054489290192535]
Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues.<n>Online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame.
arXiv Detail & Related papers (2025-01-16T09:57:12Z)
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.<n>Privacy concerns limit the accessibility of personalized treatment data.<n>MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z)
Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha) Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z)
PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait [4.831663144935878]
PsyMo is a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health.
arXiv Detail & Related papers (2023-08-21T11:06:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.