Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling
- URL: http://arxiv.org/abs/2505.15715v2
- Date: Mon, 03 Nov 2025 17:03:30 GMT
- Title: Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling
- Authors: He Hu, Yucheng Zhou, Juzheng Si, Qianning Wang, Hengheng Zhang, Fuji Ren, Fei Ma, Laizhong Cui, Qi Tian,
- Abstract summary: PsyLLM is a large language model designed to integrate diagnostic and therapeutic reasoning for mental health counseling.<n>It processes real-world mental health posts from Reddit and generates multi-turn dialogue structures.<n>Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models.
- Score: 50.83055329849865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) hold significant potential for mental health support, capable of generating empathetic responses and simulating therapeutic conversations. However, existing LLM-based approaches often lack the clinical grounding necessary for real-world psychological counseling, particularly in explicit diagnostic reasoning aligned with standards like the DSM/ICD and incorporating diverse therapeutic modalities beyond basic empathy or single strategies. To address these critical limitations, we propose PsyLLM, the first large language model designed to systematically integrate both diagnostic and therapeutic reasoning for mental health counseling. To develop PsyLLM, we design a novel automated data synthesis pipeline that processes real-world mental health posts collected from Reddit, where users frequently share psychological distress and seek community support. This pipeline processes real-world mental health posts, generates multi-turn dialogue structures, and leverages LLMs guided by international diagnostic standards (e.g., DSM/ICD) and multiple therapeutic frameworks (e.g., CBT, ACT, psychodynamic) to simulate detailed clinical reasoning processes. Rigorous multi-dimensional filtering ensures the generation of high-quality, clinically aligned dialogue data. In addition, we introduce a new benchmark and evaluation protocol, assessing counseling quality across four key dimensions. Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models on this benchmark. The model weights and dataset have been publicly released at https://github.com/Emo-gml/PsyLLM.
Related papers
- MindEval: Benchmarking Language Models on Multi-turn Mental Health Support [10.524387723320432]
MindEval is a framework for automatically evaluating language models in realistic, multi-turn mental health therapy conversations.<n>We quantitatively validate the realism of our simulated patients against human-generated text and by demonstrating strong correlations between automatic and human expert judgments.<n>We evaluate 12 state-of-the-art LLMs and show that all models struggle, scoring below 4 out of 6 on average, with particular weaknesses in problematic AI-specific patterns of communication.
arXiv Detail & Related papers (2025-11-23T15:19:29Z) - Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires [5.163738939075784]
We present an LLM-driven pipeline that generates synthetic counseling dialogues based on structured client profiles and psychological questionnaires.<n>Our framework, SQPsych, converts structured psychological input into natural language dialogues through therapist-client simulations.<n>Our findings highlight the potential of synthetic data to enable scalable, data-secure, and clinically informed AI for mental health support.
arXiv Detail & Related papers (2025-10-29T10:55:52Z) - Psychiatry-Bench: A Multi-Task Benchmark for LLMs in Psychiatry [1.2879523047871226]
PsychiatryBench is a rigorously curated benchmark grounded exclusively in expert-validated psychiatric textbooks and casebooks.<n> PsychiatryBench comprises eleven distinct question-answering tasks ranging from diagnostic reasoning and treatment planning to longitudinal follow-up, management planning, clinical approach, sequential case analysis, and multiple-choice/extended matching formats totaling over 5,300 expert-annotated items.
arXiv Detail & Related papers (2025-09-07T20:57:24Z) - Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models [92.93521294357058]
Narrative therapy helps individuals transform problematic life stories into empowering alternatives.<n>Current approaches lack realism in specialized psychotherapy and fail to capture therapeutic progression over time.<n>Int (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses.
arXiv Detail & Related papers (2025-07-27T11:52:09Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue [14.95390953068765]
Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges.<n>We propose Ours, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty.<n>Our approach shows immense practical value by reducing misdiagnosis risks in time-pressured settings, freeing clinicians for complex cases, and pioneering a strategy to optimize medical resource allocation and alleviate workforce shortages.
arXiv Detail & Related papers (2025-05-26T07:48:14Z) - A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions [13.17064228097947]
Large language models (LLMs) offer promising solutions in psychotherapy by enhancing the assessment, diagnosis, and treatment of mental health conditions.<n>This survey provides a comprehensive overview of the current landscape of LLM applications in psychotherapy.<n>We present a novel conceptual taxonomy to organize the psychotherapy process into three core components: assessment, diagnosis, and treatment, and examine the challenges and advancements in each area.
arXiv Detail & Related papers (2025-02-16T12:18:40Z) - Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations [74.83732294523402]
We introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards.<n>We also explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes.<n>Experiments show that dialogue-tuned models outperform traditional methods, with improvements of $9.64%$ in multi-round reasoning scenarios and $6.18%$ in accuracy in a noisy environment.
arXiv Detail & Related papers (2025-01-29T18:58:48Z) - AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling [57.054489290192535]
Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues.<n>Online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame.
arXiv Detail & Related papers (2025-01-16T09:57:12Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory [35.41386783586689]
This paper introduces the Agent Mental Clinic (AMC), a self-improving conversational agent system designed to enhance depression diagnosis through simulated dialogues between patient and psychiatrist agents.
We design a psychiatrist agent consisting of a tertiary memory structure, a dialogue control and a memory sampling module, fully leveraging the skills reflected by the psychiatrist agent, achieving great accuracy on depression risk and suicide risk diagnosis via conversation.
arXiv Detail & Related papers (2024-09-20T14:25:08Z) - Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy [6.952909762512736]
We examine the effects of prompt engineering to guide Large Language Models (LLMs) in delivering parts of a Problem-Solving Therapy session via text.
We demonstrate that the models' capability to deliver protocolized therapy can be improved with the proper use of prompt engineering methods.
arXiv Detail & Related papers (2024-08-27T17:25:16Z) - PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation [32.40846713004979]
PsycoLLM is trained on a proposed high-quality psychological dataset.<n>We augment this process with real-world psychological case backgrounds extracted from online platforms.<n>We develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China.
arXiv Detail & Related papers (2024-07-08T08:25:56Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.