Can LLMs Address Mental Health Questions? A Comparison with Human Therapists
- URL: http://arxiv.org/abs/2509.12102v1
- Date: Mon, 15 Sep 2025 16:26:13 GMT
- Title: Can LLMs Address Mental Health Questions? A Comparison with Human Therapists
- Authors: Synthia Wang, Yuwei Cheng, Austin Song, Sarah Keedy, Marc Berman, Nick Feamster,
- Abstract summary: We compare therapist-written responses to those generated by ChatGPT, Gemini, and Llama for real patient questions.<n>LLMs produced longer, more readable, and lexically richer responses with a more positive tone, while therapist responses were more often written in the first person.
- Score: 9.025403092262293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Limited access to mental health care has motivated the use of digital tools and conversational agents powered by large language models (LLMs), yet their quality and reception remain unclear. We present a study comparing therapist-written responses to those generated by ChatGPT, Gemini, and Llama for real patient questions. Text analysis showed that LLMs produced longer, more readable, and lexically richer responses with a more positive tone, while therapist responses were more often written in the first person. In a survey with 150 users and 23 licensed therapists, participants rated LLM responses as clearer, more respectful, and more supportive than therapist-written answers. Yet, both groups of participants expressed a stronger preference for human therapist support. These findings highlight the promise and limitations of LLMs in mental health, underscoring the need for designs that balance their communicative strengths with concerns of trust, privacy, and accountability.
Related papers
- MoPHES:Leveraging on-device LLMs as Agent for Mobile Psychological Health Evaluation and Support [4.633878208731596]
This paper proposes MoPHES, a framework that integrates mental state evaluation, conversational support, and professional treatment recommendations.<n>One agent is fine-tuned on mental health conditions datasets to assess users' mental states and predict the severity of anxiety and depression; the other is fine-tuned on multi-turn dialogues to handle conversations with users.<n>Both models are also deployed directly on mobile devices to enhance user convenience and protect user privacy.
arXiv Detail & Related papers (2025-10-17T15:22:42Z) - Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models [72.36715571932696]
Narrative therapy helps individuals transform problematic life stories into empowering alternatives.<n>Current approaches lack realism in specialized psychotherapy and fail to capture therapeutic progression over time.<n>Int (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses.
arXiv Detail & Related papers (2025-07-27T11:52:09Z) - "It Listens Better Than My Therapist": Exploring Social Media Discourse on LLMs as Mental Health Tool [1.223779595809275]
Large language models (LLMs) offer new capabilities in conversational fluency, empathy simulation, and availability.<n>This study explores how users engage with LLMs as mental health tools by analyzing over 10,000 TikTok comments.<n>Results show that nearly 20% of comments reflect personal use, with these users expressing overwhelmingly positive attitudes.
arXiv Detail & Related papers (2025-04-14T17:37:32Z) - Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - Can AI Relate: Testing Large Language Model Response for Mental Health Support [23.97212082563385]
Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS.
We develop an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment.
arXiv Detail & Related papers (2024-05-20T13:42:27Z) - A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health [42.711913023646915]
We propose a novel framework for evaluating the nuanced conversation abilities of Large Language Models (LLMs)
Within it, we develop a series of quantitative metrics developed from literature on using psychotherapy conversation analysis literature.
We use our framework to evaluate several popular frontier LLMs, including some GPT and Llama models, through a verified mental health dataset.
arXiv Detail & Related papers (2024-03-08T23:46:37Z) - A Computational Framework for Behavioral Assessment of LLM Therapists [7.665475687919995]
Large language models (LLMs) like ChatGPT have increased interest in their use as therapists to address mental health challenges.<n>We propose BOLT, a proof-of-concept computational framework to systematically assess the conversational behavior of LLM therapists.
arXiv Detail & Related papers (2024-01-01T17:32:28Z) - Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using
PsychoBench [83.41621219298489]
We propose a framework, PsychoBench, for evaluating diverse psychological aspects of Large Language Models (LLMs)
PsychoBench classifies these scales into four distinct categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities.
We employ a jailbreak approach to bypass the safety alignment protocols and test the intrinsic natures of LLMs.
arXiv Detail & Related papers (2023-10-02T17:46:09Z) - Inducing anxiety in large language models can induce bias [47.85323153767388]
We focus on twelve established large language models (LLMs) and subject them to a questionnaire commonly used in psychiatry.
Our results show that six of the latest LLMs respond robustly to the anxiety questionnaire, producing comparable anxiety scores to humans.
Anxiety-induction not only influences LLMs' scores on an anxiety questionnaire but also influences their behavior in a previously-established benchmark measuring biases such as racism and ageism.
arXiv Detail & Related papers (2023-04-21T16:29:43Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.