Exploring the Efficacy of Large Language Models in Summarizing Mental
Health Counseling Sessions: A Benchmark Study
- URL: http://arxiv.org/abs/2402.19052v1
- Date: Thu, 29 Feb 2024 11:29:47 GMT
- Title: Exploring the Efficacy of Large Language Models in Summarizing Mental
Health Counseling Sessions: A Benchmark Study
- Authors: Prottay Kumar Adhikary, Aseem Srivastava, Shivani Kumar, Salam Michael
Singh, Puneet Manuja, Jini K Gopinath, Vijay Krishnan, Swati Kedia, Koushik
Sinha Deb, Tanmoy Chakraborty
- Abstract summary: Comprehensive summaries of sessions enable an effective continuity in mental health counseling.
Manual summarization presents a significant challenge, diverting experts' attention from the core counseling process.
This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of therapy sessions.
- Score: 17.32433545370711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comprehensive summaries of sessions enable an effective continuity in mental
health counseling, facilitating informed therapy planning. Yet, manual
summarization presents a significant challenge, diverting experts' attention
from the core counseling process. This study evaluates the effectiveness of
state-of-the-art Large Language Models (LLMs) in selectively summarizing
various components of therapy sessions through aspect-based summarization,
aiming to benchmark their performance. We introduce MentalCLOUDS, a
counseling-component guided summarization dataset consisting of 191 counseling
sessions with summaries focused on three distinct counseling components (aka
counseling aspects). Additionally, we assess the capabilities of 11
state-of-the-art LLMs in addressing the task of component-guided summarization
in counseling. The generated summaries are evaluated quantitatively using
standard summarization metrics and verified qualitatively by mental health
professionals. Our findings demonstrate the superior performance of
task-specific LLMs such as MentalLlama, Mistral, and MentalBART in terms of
standard quantitative metrics such as Rouge-1, Rouge-2, Rouge-L, and BERTScore
across all aspects of counseling components. Further, expert evaluation reveals
that Mistral supersedes both MentalLlama and MentalBART based on six parameters
-- affective attitude, burden, ethicality, coherence, opportunity costs, and
perceived effectiveness. However, these models share the same weakness by
demonstrating a potential for improvement in the opportunity costs and
perceived effectiveness metrics.
Related papers
- CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.
We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.
Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Optimizing Psychological Counseling with Instruction-Tuned Large Language Models [9.19192059750618]
This paper explores the application of large language models (LLMs) in psychological counseling.
We present a method for instruction tuning LLMs with specialized prompts to enhance their performance in providing empathetic, relevant, and supportive responses.
arXiv Detail & Related papers (2024-06-19T15:13:07Z) - ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification [14.644324586153866]
We propose ERD, which improves cognitive distortion classification performance with the aid of additional modules.
Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score.
arXiv Detail & Related papers (2024-03-21T09:28:38Z) - PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents [68.50571379012621]
Psychological measurement is essential for mental health, self-understanding, and personal development.
PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z) - PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models [34.09419351705938]
This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating Large Language Models (LLMs)
This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks.
arXiv Detail & Related papers (2023-11-15T18:32:27Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Towards Interpretable Mental Health Analysis with Large Language Models [27.776003210275608]
We evaluate the mental health analysis and emotional reasoning ability of large language models (LLMs) on 11 datasets across 5 tasks.
Based on prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions.
We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations.
arXiv Detail & Related papers (2023-04-06T19:53:59Z) - Counseling Summarization using Mental Health Knowledge Guided Utterance
Filtering [25.524804770124145]
The aim is mental health counseling summarization to build upon domain knowledge and to help clinicians quickly glean meaning.
We create a new dataset after annotating 12.9K utterances of counseling components and reference summaries for each dialogue.
ConSum undergoes three independent modules. First, to assess the presence of depressive symptoms, it filters utterances utilizing the Patient Health Questionnaire (PHQ-9)
arXiv Detail & Related papers (2022-06-08T13:38:47Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.