Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming
- URL: http://arxiv.org/abs/2602.19948v1
- Date: Mon, 23 Feb 2026 15:17:18 GMT
- Title: Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming
- Authors: Ian Steenstra, Paola Pedrelli, Weiyan Shi, Stacy Marsella, Timothy W. Bickmore,
- Abstract summary: We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with cognitive-affective models.<n>We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents.<n>Our large-scale simulation reveals critical safety gaps in the use of AI for mental health support.
- Score: 23.573537738272595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character.AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions ("AI Psychosis") and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the "black box" of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.
Related papers
- Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics [36.981753143345664]
We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making.<n>By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations.
arXiv Detail & Related papers (2026-01-31T13:41:32Z) - Responsible Evaluation of AI for Mental Health [72.85175110624736]
Current approaches to evaluating AI tools in mental health care are fragmented and poorly aligned with clinical practice, social context, and first-hand user experience.<n>This paper argues for a rethinking of responsible evaluation by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity.
arXiv Detail & Related papers (2026-01-20T12:55:10Z) - Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z) - MindEval: Benchmarking Language Models on Multi-turn Mental Health Support [10.524387723320432]
MindEval is a framework for automatically evaluating language models in realistic, multi-turn mental health therapy conversations.<n>We quantitatively validate the realism of our simulated patients against human-generated text and by demonstrating strong correlations between automatic and human expert judgments.<n>We evaluate 12 state-of-the-art LLMs and show that all models struggle, scoring below 4 out of 6 on average, with particular weaknesses in problematic AI-specific patterns of communication.
arXiv Detail & Related papers (2025-11-23T15:19:29Z) - Mentalic Net: Development of RAG-based Conversational AI and Evaluation Framework for Mental Health Support [0.0]
Mentalic Net Conversational AI has a BERT Score of 0.898, with other evaluation metrics falling within satisfactory ranges.<n>We advocate for a human-in-the-loop approach and a long-term, responsible strategy in developing such transformative technologies.
arXiv Detail & Related papers (2025-08-27T03:44:56Z) - A Comprehensive Review of Datasets for Clinical Mental Health AI Systems [55.67299586253951]
We present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants.<n>Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data.
arXiv Detail & Related papers (2025-08-13T13:42:35Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - A Risk Ontology for Evaluating AI-Powered Psychotherapy Virtual Agents [13.721977133773192]
Large Language Models (LLMs) and Intelligent Virtual Agents acting as psychotherapists present opportunities for expanding mental healthcare access.<n>Their deployment has also been linked to serious adverse outcomes, including user harm and suicide.<n>We introduce a novel risk ontology specifically designed for the systematic evaluation of conversational AI psychotherapists.
arXiv Detail & Related papers (2025-05-21T05:01:39Z) - TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews [54.35097932763878]
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data.<n>Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews.<n>We demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness.
arXiv Detail & Related papers (2025-03-26T15:58:16Z) - Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities [58.61680631581921]
Mental health disorders create profound personal and societal burdens, yet conventional diagnostics are resource-intensive and limit accessibility.<n>This paper examines these challenges and proposes solutions, including anonymization, synthetic data, and privacy-preserving training.<n>It aims to advance reliable, privacy-aware AI tools that support clinical decision-making and improve mental health outcomes.
arXiv Detail & Related papers (2025-02-01T15:10:02Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.