A Computational Framework for Behavioral Assessment of LLM Therapists
- URL: http://arxiv.org/abs/2401.00820v1
- Date: Mon, 1 Jan 2024 17:32:28 GMT
- Title: A Computational Framework for Behavioral Assessment of LLM Therapists
- Authors: Yu Ying Chiu, Ashish Sharma, Inna Wanyin Lin, Tim Althoff
- Abstract summary: ChatGPT and other large language models (LLMs) have greatly increased interest in utilizing LLMs as therapists.
We propose BOLT, a novel computational framework to study the conversational behavior of LLMs when employed as therapists.
We compare the behavior of LLM therapists against that of high- and low-quality human therapy, and study how their behavior can be modulated to better reflect behaviors observed in high-quality therapy.
- Score: 8.373981505033864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of ChatGPT and other large language models (LLMs) has greatly
increased interest in utilizing LLMs as therapists to support individuals
struggling with mental health challenges. However, due to the lack of
systematic studies, our understanding of how LLM therapists behave, i.e., ways
in which they respond to clients, is significantly limited. Understanding their
behavior across a wide range of clients and situations is crucial to accurately
assess their capabilities and limitations in the high-risk setting of mental
health, where undesirable behaviors can lead to severe consequences. In this
paper, we propose BOLT, a novel computational framework to study the
conversational behavior of LLMs when employed as therapists. We develop an
in-context learning method to quantitatively measure the behavior of LLMs based
on 13 different psychotherapy techniques including reflections, questions,
solutions, normalizing, and psychoeducation. Subsequently, we compare the
behavior of LLM therapists against that of high- and low-quality human therapy,
and study how their behavior can be modulated to better reflect behaviors
observed in high-quality therapy. Our analysis of GPT and Llama-variants
reveals that these LLMs often resemble behaviors more commonly exhibited in
low-quality therapy rather than high-quality therapy, such as offering a higher
degree of problem-solving advice when clients share emotions, which is against
typical recommendations. At the same time, unlike low-quality therapy, LLMs
reflect significantly more upon clients' needs and strengths. Our analysis
framework suggests that despite the ability of LLMs to generate anecdotal
examples that appear similar to human therapists, LLM therapists are currently
not fully consistent with high-quality care, and thus require additional
research to ensure quality care.
Related papers
- LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.
We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.
Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - Therapy as an NLP Task: Psychologists' Comparison of LLMs and Human Peers in CBT [6.812247730094931]
We investigate the potential and limitations of using large language models (LLMs) as providers of evidence-based therapy.
We replicated publicly accessible mental health conversations rooted in Cognitive Behavioral Therapy (CBT) to compare session dynamics and counselor's CBT-based behaviors.
Our findings show that the peer sessions are characterized by empathy, small talk, therapeutic alliance, and shared experiences but often exhibit therapist drift.
arXiv Detail & Related papers (2024-09-03T19:19:13Z) - Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions [12.455050661682051]
We propose a framework that employs two large language models (LLMs) via role-playing for simulating counselor-client interactions.
Our framework involves two LLMs, one acting as a client equipped with a specific and real-life user profile and the other playing the role of an experienced counselor.
arXiv Detail & Related papers (2024-08-28T13:29:59Z) - An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice [0.0]
Large Language Models (LLMs) are non-deterministic, may provide incorrect or harmful responses, and cannot be regulated to assure quality control.
Our proposed framework refines LLM responses by restricting their primary knowledge base to domain-specific datasets containing validated medical information.
We conducted a validation study where expert cognitive behaviour therapy for insomnia therapists evaluated responses from the LLM in a blind format.
arXiv Detail & Related papers (2024-07-23T05:00:18Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health [42.711913023646915]
We propose a novel framework for evaluating the nuanced conversation abilities of Large Language Models (LLMs)
Within it, we develop a series of quantitative metrics developed from literature on using psychotherapy conversation analysis literature.
We use our framework to evaluate several popular frontier LLMs, including some GPT and Llama models, through a verified mental health dataset.
arXiv Detail & Related papers (2024-03-08T23:46:37Z) - Evaluating the Efficacy of Interactive Language Therapy Based on LLM for
High-Functioning Autistic Adolescent Psychological Counseling [1.1780706927049207]
This study investigates the efficacy of Large Language Models (LLMs) in interactive language therapy for high-functioning autistic adolescents.
LLMs present a novel opportunity to augment traditional psychological counseling methods.
arXiv Detail & Related papers (2023-11-12T07:55:39Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.