Automatic Evaluation for Mental Health Counseling using LLMs
- URL: http://arxiv.org/abs/2402.11958v1
- Date: Mon, 19 Feb 2024 09:00:10 GMT
- Title: Automatic Evaluation for Mental Health Counseling using LLMs
- Authors: Anqi Li, Yu Lu, Nirui Song, Shuai Zhang, Lizhi Ma, Zhenzhong Lan
- Abstract summary: Existing methods that rely on self or third-party manual reports to assess the quality of counseling suffer from subjective biases and limitations of time-consuming.
This paper proposes an innovative and efficient automatic approach using large language models (LLMs) to evaluate the working alliance in counseling conversations.
- Score: 19.71452604279078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality psychological counseling is crucial for mental health worldwide,
and timely evaluation is vital for ensuring its effectiveness. However,
obtaining professional evaluation for each counseling session is expensive and
challenging. Existing methods that rely on self or third-party manual reports
to assess the quality of counseling suffer from subjective biases and
limitations of time-consuming.
To address above challenges, this paper proposes an innovative and efficient
automatic approach using large language models (LLMs) to evaluate the working
alliance in counseling conversations. We collected a comprehensive counseling
dataset and conducted multiple third-party evaluations based on therapeutic
relationship theory. Our LLM-based evaluation, combined with our guidelines,
shows high agreement with human evaluations and provides valuable insights into
counseling scripts. This highlights the potential of LLMs as supervisory tools
for psychotherapists. By integrating LLMs into the evaluation process, our
approach offers a cost-effective and dependable means of assessing counseling
quality, enhancing overall effectiveness.
Related papers
- PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation [19.5523530046302]
We propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset.
To compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China.
The experimental results on the benchmark illustrates the effectiveness of PsycoLLM, which demonstrates superior performance compared to other LLMs.
arXiv Detail & Related papers (2024-07-08T08:25:56Z) - CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling [27.193022503592342]
We propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling.
To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues.
A comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations.
arXiv Detail & Related papers (2024-05-26T05:18:00Z) - Exploring the Efficacy of Large Language Models in Summarizing Mental
Health Counseling Sessions: A Benchmark Study [17.32433545370711]
Comprehensive summaries of sessions enable an effective continuity in mental health counseling.
Manual summarization presents a significant challenge, diverting experts' attention from the core counseling process.
This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of therapy sessions.
arXiv Detail & Related papers (2024-02-29T11:29:47Z) - LLM Agents for Psychology: A Study on Gamified Assessments [71.08193163042107]
Psychological measurement is essential for mental health, self-understanding, and personal development.
PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Consultation Checklists: Standardising the Human Evaluation of Medical
Note Generation [58.54483567073125]
We propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists.
We observed good levels of inter-annotator agreement in a first evaluation study using the protocol.
arXiv Detail & Related papers (2022-11-17T10:54:28Z) - "Am I A Good Therapist?" Automated Evaluation Of Psychotherapy Skills
Using Speech And Language Technologies [38.726068038788384]
We describe our platform and its performance, using a dataset of more than 5,000 recordings.
Our system gives comprehensive feedback to the therapist, including information about the dynamics of the session.
We are confident that a widespread use of automated psychotherapy rating tools in the near future will augment experts' capabilities.
arXiv Detail & Related papers (2021-02-22T18:52:52Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.