MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions
- URL: http://arxiv.org/abs/2509.04183v1
- Date: Thu, 04 Sep 2025 12:59:24 GMT
- Title: MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions
- Authors: Aishik Mandal, Tanmoy Chakraborty, Iryna Gurevych,
- Abstract summary: We introduce MAGneT, a novel multi-agent framework for synthetic psychological counseling session generation.<n>Unlike prior single-agent approaches, MAGneT better captures the structure and nuance of real counseling.<n> Empirical results show that MAGneT significantly outperforms existing methods in quality, diversity, and therapeutic alignment of the generated counseling sessions.
- Score: 58.61680631581921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing demand for scalable psychological counseling highlights the need for fine-tuning open-source Large Language Models (LLMs) with high-quality, privacy-compliant data, yet such data remains scarce. Here we introduce MAGneT, a novel multi-agent framework for synthetic psychological counseling session generation that decomposes counselor response generation into coordinated sub-tasks handled by specialized LLM agents, each modeling a key psychological technique. Unlike prior single-agent approaches, MAGneT better captures the structure and nuance of real counseling. In addition, we address inconsistencies in prior evaluation protocols by proposing a unified evaluation framework integrating diverse automatic and expert metrics. Furthermore, we expand the expert evaluations from four aspects of counseling in previous works to nine aspects, enabling a more thorough and robust assessment of data quality. Empirical results show that MAGneT significantly outperforms existing methods in quality, diversity, and therapeutic alignment of the generated counseling sessions, improving general counseling skills by 3.2% and CBT-specific skills by 4.3% on average on cognitive therapy rating scale (CTRS). Crucially, experts prefer MAGneT-generated sessions in 77.2% of cases on average across all aspects. Moreover, fine-tuning an open-source model on MAGneT-generated sessions shows better performance, with improvements of 6.3% on general counseling skills and 7.3% on CBT-specific skills on average on CTRS over those fine-tuned with sessions generated by baseline methods. We also make our code and data public.
Related papers
- Multi-dimensional Assessment and Explainable Feedback for Counselor Responses to Client Resistance in Text-based Counseling with LLMs [28.919083157390464]
We present a comprehensive pipeline for the multi-dimensional evaluation of human counselors' interventions targeting client resistance in text-based therapy.<n>We introduce a theory-driven framework that decomposes counselor responses into four distinct communication mechanisms.<n>We show that our approach can effectively distinguish the quality of different communication mechanisms.
arXiv Detail & Related papers (2026-02-25T07:05:05Z) - PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor [26.81428514159215]
textttPsychEval is a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges.<n>It demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning.<n>The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills.
arXiv Detail & Related papers (2026-01-05T05:26:57Z) - Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models [57.73472878679636]
We introduce Med-RewardBench, the first benchmark specifically designed to evaluate medical reward models and judges.<n>Med-RewardBench features a multimodal dataset spanning 13 organ systems and 8 clinical departments, with 1,026 expert-annotated cases.<n>A rigorous three-step process ensures high-quality evaluation data across six clinically critical dimensions.
arXiv Detail & Related papers (2025-08-29T08:58:39Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback [51.26493826461026]
We propose Psi-Arena, an interactive framework for comprehensive assessment and optimization of large language models (LLMs)<n>Arena features realistic arena interactions that simulate real-world counseling through multi-stage dialogues with psychologically profiled NPC clients.<n>Experiments across eight state-of-the-art LLMs show significant performance variations in different real-world scenarios and evaluation perspectives.
arXiv Detail & Related papers (2025-05-06T08:22:51Z) - AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling [57.054489290192535]
Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues.<n>Online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame.
arXiv Detail & Related papers (2025-01-16T09:57:12Z) - CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.<n>We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.<n> Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - Advancing Mental Health Pre-Screening: A New Custom GPT for Psychological Distress Assessment [0.8287206589886881]
'Psycho Analyst' is a custom GPT model based on OpenAI's GPT-4, optimized for pre-screening mental health disorders.<n>The model adeptly decodes nuanced linguistic indicators of mental health disorders.
arXiv Detail & Related papers (2024-08-03T00:38:30Z) - Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory [24.937025825501998]
We create a multi-turn dialogue dataset that emulates real-life interactions using the goal-oriented and structured approach of Cognitive Behavioral Therapy (CBT)
We benchmark against established psychological criteria used to evaluate real counseling sessions, ensuring alignment with expert evaluations.
Experimental results demonstrate that Camel, a model trained with Cactus, outperforms other models in counseling skills, highlighting its effectiveness and potential as a counseling agent.
arXiv Detail & Related papers (2024-07-03T13:41:31Z) - Exploring the Efficacy of Large Language Models in Summarizing Mental
Health Counseling Sessions: A Benchmark Study [17.32433545370711]
Comprehensive summaries of sessions enable an effective continuity in mental health counseling.
Manual summarization presents a significant challenge, diverting experts' attention from the core counseling process.
This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of therapy sessions.
arXiv Detail & Related papers (2024-02-29T11:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.