Can AI Relate: Testing Large Language Model Response for Mental Health Support
- URL: http://arxiv.org/abs/2405.12021v1
- Date: Mon, 20 May 2024 13:42:27 GMT
- Title: Can AI Relate: Testing Large Language Model Response for Mental Health Support
- Authors: Saadia Gabriel, Isha Puri, Xuhai Xu, Matteo Malgaroli, Marzyeh Ghassemi,
- Abstract summary: Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS.
This work develops an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment.
- Score: 23.97212082563385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS. A proposed deployment use case is psychotherapy, where a LLM-powered chatbot can treat a patient undergoing a mental health crisis. Deployment of LLMs for mental health response could hypothetically broaden access to psychotherapy and provide new possibilities for personalizing care. However, recent high-profile failures, like damaging dieting advice offered by the Tessa chatbot to patients with eating disorders, have led to doubt about their reliability in high-stakes and safety-critical settings. In this work, we develop an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment. Using human evaluation with trained clinicians and automatic quality-of-care metrics grounded in psychology research, we compare the responses provided by peer-to-peer responders to those provided by a state-of-the-art LLM. We show that LLMs like GPT-4 use implicit and explicit cues to infer patient demographics like race. We then show that there are statistically significant discrepancies between patient subgroups: Responses to Black posters consistently have lower empathy than for any other demographic group (2%-13% lower than the control group). Promisingly, we do find that the manner in which responses are generated significantly impacts the quality of the response. We conclude by proposing safety guidelines for the potential deployment of LLMs for mental health response.
Related papers
- LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries.
This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation.
By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z) - Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles [58.82161879559716]
We develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert.
We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners.
arXiv Detail & Related papers (2024-07-01T00:43:02Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist [8.476124415001598]
WundtGPT is an empathetic and proactive mental health large language model.
It is designed to assist psychologists in diagnosis and help patients who are reluctant to communicate face-to-face understand their psychological conditions.
arXiv Detail & Related papers (2024-06-16T16:06:38Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions [9.327472312657392]
The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support.
This study investigates the question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians?
We collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT.
arXiv Detail & Related papers (2024-05-26T01:58:57Z) - Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided [38.11184388388781]
Large language models (LLMs) have offered new opportunities for emotional support.
This work takes a first step by engaging with cognitive reappraisals.
We conduct a first-of-its-kind expert evaluation of an LLM's zero-shot ability to generate cognitive reappraisal responses.
arXiv Detail & Related papers (2024-04-01T17:56:30Z) - A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health [42.711913023646915]
We propose a novel framework for evaluating the nuanced conversation abilities of Large Language Models (LLMs)
Within it, we develop a series of quantitative metrics developed from literature on using psychotherapy conversation analysis literature.
We use our framework to evaluate several popular frontier LLMs, including some GPT and Llama models, through a verified mental health dataset.
arXiv Detail & Related papers (2024-03-08T23:46:37Z) - LLM Agents for Psychology: A Study on Gamified Assessments [71.08193163042107]
Psychological measurement is essential for mental health, self-understanding, and personal development.
PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z) - Privacy Aware Question-Answering System for Online Mental Health Risk
Assessment [0.45935798913942893]
Social media platforms have enabled individuals suffering from mental illnesses to share their lived experiences and find the online support necessary to cope.
We propose a Question-Answering (QA) approach to assess mental health risk using the Unified-QA model on two large mental health datasets.
Our results demonstrate the effectiveness of modeling risk assessment as a QA task, specifically for mental health use cases.
arXiv Detail & Related papers (2023-06-09T03:37:49Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.