Do Large Language Models Align with Core Mental Health Counseling Competencies?
- URL: http://arxiv.org/abs/2410.22446v1
- Date: Tue, 29 Oct 2024 18:27:11 GMT
- Title: Do Large Language Models Align with Core Mental Health Counseling Competencies?
- Authors: Viet Cuong Nguyen, Mohammad Taher, Dongwan Hong, Vinicius Konkolics Possobom, Vibha Thirunellayi Gopalakrishnan, Ekta Raj, Zihang Li, Heather J. Soled, Michael L. Birnbaum, Srijan Kumar, Munmun De Choudhury,
- Abstract summary: CounselingBench is a novel NCMHCE-based benchmark evaluating Large Language Models (LLMs)
We find frontier models exceed minimum thresholds but fall short of expert-level performance.
Our findings highlight the complexities of developing AI systems for mental health counseling.
- Score: 19.375161727597536
- License:
- Abstract: The rapid evolution of Large Language Models (LLMs) offers promising potential to alleviate the global scarcity of mental health professionals. However, LLMs' alignment with essential mental health counseling competencies remains understudied. We introduce CounselingBench, a novel NCMHCE-based benchmark evaluating LLMs across five key mental health counseling competencies. Testing 22 general-purpose and medical-finetuned LLMs, we find frontier models exceed minimum thresholds but fall short of expert-level performance, with significant variations: they excel in Intake, Assessment & Diagnosis yet struggle with Core Counseling Attributes and Professional Practice & Ethics. Medical LLMs surprisingly underperform generalist models accuracy-wise, while at the same time producing slightly higher-quality justifications but making more context-related errors. Our findings highlight the complexities of developing AI systems for mental health counseling, particularly for competencies requiring empathy and contextual understanding. We found that frontier LLMs perform at a level exceeding the minimal required level of aptitude for all key mental health counseling competencies, but fall short of expert-level performance, and that current medical LLMs do not significantly improve upon generalist models in mental health counseling competencies. This underscores the critical need for specialized, mental health counseling-specific fine-tuned LLMs that rigorously aligns with core competencies combined with appropriate human supervision before any responsible real-world deployment can be considered.
Related papers
- SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [122.04298386571692]
Large language models (LLMs) have demonstrated remarkable proficiency in academic disciplines such as mathematics, physics, and computer science.
However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks.
We present SuperGPQA, a benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines.
arXiv Detail & Related papers (2025-02-20T17:05:58Z) - Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored.
Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities.
We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare.
This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z) - CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.
We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.
Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback [6.681247642186701]
We propose a framework for converting medical cases into high-quality USMLE-style questions.
MCQG-SRefine integrates expert-driven prompt engineering with iterative self-critique and self-correction feedback.
We introduce an LLM-as-Judge-based automatic metric to replace the complex and costly expert evaluation process.
arXiv Detail & Related papers (2024-10-17T03:38:29Z) - RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules.
We develop a medical dialogue dataset comprising rule-based communications between patients and physicians.
Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z) - Large Language Model for Mental Health: A Systematic Review [2.9429776664692526]
Large language models (LLMs) have attracted significant attention for potential applications in digital health.
This systematic review focuses on their strengths and limitations in early screening, digital interventions, and clinical applications.
arXiv Detail & Related papers (2024-02-19T17:58:41Z) - A Computational Framework for Behavioral Assessment of LLM Therapists [7.665475687919995]
Large language models (LLMs) like ChatGPT have increased interest in their use as therapists to address mental health challenges.
We propose BOLT, a proof-of-concept computational framework to systematically assess the conversational behavior of LLM therapists.
arXiv Detail & Related papers (2024-01-01T17:32:28Z) - Challenges of Large Language Models for Mental Health Counseling [4.604003661048267]
The global mental health crisis is looming with a rapid increase in mental disorders, limited resources, and the social stigma of seeking treatment.
The application of large language models (LLMs) in the mental health domain raises concerns regarding the accuracy, effectiveness, and reliability of the information provided.
This paper investigates the major challenges associated with the development of LLMs for psychological counseling, including model hallucination, interpretability, bias, privacy, and clinical effectiveness.
arXiv Detail & Related papers (2023-11-23T08:56:41Z) - Rethinking Large Language Models in Mental Health Applications [42.21805311812548]
Large Language Models (LLMs) have become valuable assets in mental health.
This paper offers a perspective on using LLMs in mental health applications.
arXiv Detail & Related papers (2023-11-19T08:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.