PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor
- URL: http://arxiv.org/abs/2601.01802v3
- Date: Thu, 08 Jan 2026 13:52:50 GMT
- Title: PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor
- Authors: Qianjun Pan, Junyi Wang, Jie Zhou, Yutao Yang, Junsong Li, Kaiyin Xu, Yougen Zhou, Yihan Li, Jingyuan Zhao, Qin Chen, Ningning Zhou, Kai Chen, Liang He,
- Abstract summary: textttPsychEval is a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges.<n>It demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning.<n>The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills.
- Score: 26.81428514159215
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges: \textbf{1) Can we train a highly realistic AI counselor?} Realistic counseling is a longitudinal task requiring sustained memory and dynamic goal tracking. We propose a multi-session benchmark (spanning 6-10 sessions across three distinct stages) that demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning. The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills. \textbf{2) How to train a multi-therapy AI counselor?} While existing models often focus on a single therapy, complex cases frequently require flexible strategies among various therapies. We construct a diverse dataset covering five therapeutic modalities (Psychodynamic, Behaviorism, CBT, Humanistic Existentialist, and Postmodernist) alongside an integrative therapy with a unified three-stage clinical framework across six core psychological topics. \textbf{3) How to systematically evaluate an AI counselor?} We establish a holistic evaluation framework with 18 therapy-specific and therapy-shared metrics across Client-Level and Counselor-Level dimensions. To support this, we also construct over 2,000 diverse client profiles. Extensive experimental analysis fully validates the superior quality and clinical fidelity of our dataset. Crucially, \texttt{PsychEval} transcends static benchmarking to serve as a high-fidelity reinforcement learning environment that enables the self-evolutionary training of clinically responsible and adaptive AI counselors.
Related papers
- TheraMind: A Strategic and Adaptive Agent for Longitudinal Psychological Counseling [53.46927050949822]
We introduce TheraMind, a strategic and adaptive agent for longitudinal psychological counseling.<n>The cornerstone of TheraMind is a novel dual-loop architecture that decouples the counseling process into an Intra-Session Loop and a Cross-Session Loop.<n>The Cross-Session Loop empowers the agent with long-term adaptability by evaluating the efficacy of the applied therapy after each session and adjusting the method for subsequent interactions.
arXiv Detail & Related papers (2025-10-29T17:54:20Z) - MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions [58.61680631581921]
We introduce MAGneT, a novel multi-agent framework for synthetic psychological counseling session generation.<n>Unlike prior single-agent approaches, MAGneT better captures the structure and nuance of real counseling.<n> Empirical results show that MAGneT significantly outperforms existing methods in quality, diversity, and therapeutic alignment of the generated counseling sessions.
arXiv Detail & Related papers (2025-09-04T12:59:24Z) - DiaCBT: A Long-Periodic Dialogue Corpus Guided by Cognitive Conceptualization Diagram for CBT-based Psychological Counseling [29.386911644663304]
Large language models (LLMs) offer a promising solution to expand access to mental health services.<n>We construct a long-periodic dialogue corpus for counseling based on cognitive behavioral therapy (CBT)<n>Our dataset includes multiple sessions for each counseling and incorporates cognitive conceptualization diagrams (CCDs) to guide client simulation.
arXiv Detail & Related papers (2025-09-03T04:17:19Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling [50.83055329849865]
PsyLLM is a large language model designed to integrate diagnostic and therapeutic reasoning for mental health counseling.<n>It processes real-world mental health posts from Reddit and generates multi-turn dialogue structures.<n>Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2025-05-21T16:24:49Z) - PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System [6.868956036918275]
PsyCounAssist is a comprehensive AI-powered counseling system specifically designed to augment psychological counseling practices.<n>It integrates multimodal emotion recognition, automated structured session reporting, and personalized AI-generated follow-up support.<n> Deployed on Android-based tablet devices, the system demonstrates practical applicability and flexibility in real-world counseling scenarios.
arXiv Detail & Related papers (2025-04-23T09:49:05Z) - AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling [57.054489290192535]
Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues.<n>Online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame.
arXiv Detail & Related papers (2025-01-16T09:57:12Z) - CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.<n>We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.<n> Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory [24.937025825501998]
We create a multi-turn dialogue dataset that emulates real-life interactions using the goal-oriented and structured approach of Cognitive Behavioral Therapy (CBT)
We benchmark against established psychological criteria used to evaluate real counseling sessions, ensuring alignment with expert evaluations.
Experimental results demonstrate that Camel, a model trained with Cactus, outperforms other models in counseling skills, highlighting its effectiveness and potential as a counseling agent.
arXiv Detail & Related papers (2024-07-03T13:41:31Z) - "Am I A Good Therapist?" Automated Evaluation Of Psychotherapy Skills
Using Speech And Language Technologies [38.726068038788384]
We describe our platform and its performance, using a dataset of more than 5,000 recordings.
Our system gives comprehensive feedback to the therapist, including information about the dynamics of the session.
We are confident that a widespread use of automated psychotherapy rating tools in the near future will augment experts' capabilities.
arXiv Detail & Related papers (2021-02-22T18:52:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.