Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
- URL: http://arxiv.org/abs/2504.18412v1
- Date: Fri, 25 Apr 2025 15:14:21 GMT
- Title: Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
- Authors: Jared Moore, Declan Grabb, William Agnew, Kevin Klyman, Stevie Chancellor, Desmond C. Ong, Nick Haber,
- Abstract summary: We investigate the use of large language models (LLM) to replace mental health providers.<n>Contrary to best practices in the medical community, LLMs express stigma toward those with mental health conditions.<n>We conclude that LLMs should not replace therapists, and we discuss alternative roles for LLMs in clinical therapy.
- Score: 7.88918403732414
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Should a large language model (LLM) be used as a therapist? In this paper, we investigate the use of LLMs to *replace* mental health providers, a use case promoted in the tech startup and research space. We conduct a mapping review of therapy guides used by major medical institutions to identify crucial aspects of therapeutic relationships, such as the importance of a therapeutic alliance between therapist and client. We then assess the ability of LLMs to reproduce and adhere to these aspects of therapeutic relationships by conducting several experiments investigating the responses of current LLMs, such as `gpt-4o`. Contrary to best practices in the medical community, LLMs 1) express stigma toward those with mental health conditions and 2) respond inappropriately to certain common (and critical) conditions in naturalistic therapy settings -- e.g., LLMs encourage clients' delusional thinking, likely due to their sycophancy. This occurs even with larger and newer LLMs, indicating that current safety practices may not address these gaps. Furthermore, we note foundational and practical barriers to the adoption of LLMs as therapists, such as that a therapeutic alliance requires human characteristics (e.g., identity and stakes). For these reasons, we conclude that LLMs should not replace therapists, and we discuss alternative roles for LLMs in clinical therapy.
Related papers
- "It Listens Better Than My Therapist": Exploring Social Media Discourse on LLMs as Mental Health Tool [1.223779595809275]
Large language models (LLMs) offer new capabilities in conversational fluency, empathy simulation, and availability.
This study explores how users engage with LLMs as mental health tools by analyzing over 10,000 TikTok comments.
Results show that nearly 20% of comments reflect personal use, with these users expressing overwhelmingly positive attitudes.
arXiv Detail & Related papers (2025-04-14T17:37:32Z) - Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models [50.16340812031201]
We show that large language models (LLMs) do not update their beliefs as expected from the Bayesian framework.
We teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model.
arXiv Detail & Related papers (2025-03-21T20:13:04Z) - Understanding the Dark Side of LLMs' Intrinsic Self-Correction [55.51468462722138]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability.
Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts.
We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z) - An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice [0.0]
Large Language Models (LLMs) are non-deterministic, may provide incorrect or harmful responses, and cannot be regulated to assure quality control.
Our proposed framework refines LLM responses by restricting their primary knowledge base to domain-specific datasets containing validated medical information.
We conducted a validation study where expert cognitive behaviour therapy for insomnia therapists evaluated responses from the LLM in a blind format.
arXiv Detail & Related papers (2024-07-23T05:00:18Z) - Towards a Client-Centered Assessment of LLM Therapists by Client Simulation [35.715821701042266]
This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients.
Ethically, asking humans to frequently mimic clients and exposing them to potentially harmful LLM outputs can be risky and unsafe.
We propose ClientCAST, a client-centered approach to assessing LLM therapists by client simulation.
arXiv Detail & Related papers (2024-06-18T04:46:55Z) - Can AI Relate: Testing Large Language Model Response for Mental Health Support [23.97212082563385]
Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS.
We develop an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment.
arXiv Detail & Related papers (2024-05-20T13:42:27Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation [66.01754585188739]
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge.
Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations.
We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence.
arXiv Detail & Related papers (2024-02-18T04:57:19Z) - A Computational Framework for Behavioral Assessment of LLM Therapists [7.665475687919995]
Large language models (LLMs) like ChatGPT have increased interest in their use as therapists to address mental health challenges.<n>We propose BOLT, a proof-of-concept computational framework to systematically assess the conversational behavior of LLM therapists.
arXiv Detail & Related papers (2024-01-01T17:32:28Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.