PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support
- URL: http://arxiv.org/abs/2601.12754v1
- Date: Mon, 19 Jan 2026 06:20:57 GMT
- Title: PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support
- Authors: Jiwon Kim, Violeta J. Rodriguez, Dong Whi Yoo, Eshwar Chandrasekharan, Koustuv Saha,
- Abstract summary: Large language models (LLMs) are increasingly used for mental health support.<n>LLMs can produce responses that are overly directive, inconsistent, or clinically misaligned.<n>We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support.
- Score: 18.251267901872886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are increasingly used for mental health support, yet they can produce responses that are overly directive, inconsistent, or clinically misaligned, particularly in sensitive or high-risk contexts. Existing approaches to mitigating these risks largely rely on implicit alignment through training or prompting, offering limited transparency and runtime accountability. We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support that integrates a Responder agent with a supervisory Judge agent grounded in the clinically validated Motivational Interviewing Treatment Integrity (MITI-4) framework. The Judgeaudits each response and provides structuredALLOW or REVISE decisions that guide runtime response refinement. We simulate counseling interactions using a support-seeker simulator derived from human-annotated motivational interviewing data. We find that Judge-supervised interactions show significant improvements in key MITI dimensions, including Partnership, Seek Collaboration, and overall Relational quality. Our quantitative findings are supported by qualitative expert evaluation, which further highlights the nuances of runtime supervision. Together, our results reveal that such pairedagent approach can provide clinically grounded auditing and refinement for AI-assisted conversational mental health support.
Related papers
- CARE: An Explainable Computational Framework for Assessing Client-Perceived Therapeutic Alliance Using Large Language Models [19.027335814014528]
We present CARE, an LLM-based framework to automatically predict multi-dimensional alliance scores and generate interpretable rationales from counseling transcripts.<n> CARE is built on the CounselingWAI dataset and enriched with 9,516 expert-curated rationales.<n>Experiments show that CARE outperforms leading LLMs and substantially reduces the gap between counselor evaluations and client-perceived alliance.
arXiv Detail & Related papers (2026-02-24T07:52:56Z) - Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming [23.573537738272595]
We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with cognitive-affective models.<n>We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents.<n>Our large-scale simulation reveals critical safety gaps in the use of AI for mental health support.
arXiv Detail & Related papers (2026-02-23T15:17:18Z) - Mirror: A Multi-Agent System for AI-Assisted Ethics Review [104.3684024153469]
Mirror is an agentic framework for AI-assisted ethical review.<n>It integrates ethical reasoning, structured rule interpretation, and multi-agent deliberation within a unified architecture.
arXiv Detail & Related papers (2026-02-09T03:38:55Z) - Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics [36.981753143345664]
We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making.<n>By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations.
arXiv Detail & Related papers (2026-01-31T13:41:32Z) - Responsible Evaluation of AI for Mental Health [72.85175110624736]
Current approaches to evaluating AI tools in mental health care are fragmented and poorly aligned with clinical practice, social context, and first-hand user experience.<n>This paper argues for a rethinking of responsible evaluation by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity.
arXiv Detail & Related papers (2026-01-20T12:55:10Z) - Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching [0.0]
We present a secure, scalable proof-of-concept system for Artificial Intelligence (AI)-augmented patient-trial matching.<n>The system generates structured eligibility assessments with interpretable reasoning chains that support human-in-the-loop review.<n>The system aims to reduce coordinator burden, intelligently broaden the set of trials considered for each patient and guarantee comprehensive auditability of all AI-generated outputs.
arXiv Detail & Related papers (2025-12-08T20:35:51Z) - Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z) - AgentMental: An Interactive Multi-Agent Framework for Explainable and Adaptive Mental Health Assessment [31.920800599579906]
Mental health assessment is crucial for early intervention and effective treatment, yet traditional clinician-based approaches are limited by the shortage of qualified professionals.<n>Recent advances in artificial intelligence have sparked growing interest in automated psychological assessment, yet most existing approaches are constrained by their reliance on static text analysis.<n>We propose a multi-agent framework for mental health evaluation that simulates clinical doctor-patient dialogues.
arXiv Detail & Related papers (2025-08-15T16:20:45Z) - EmoStage: A Framework for Accurate Empathetic Response Generation via Perspective-Taking and Phase Recognition [1.4436965372953483]
EmoStage is a framework that enhances empathetic response generation.<n>Our framework introduces perspective-taking to infer clients' psychological states and support needs.<n>Phase recognition is incorporated to ensure alignment with the counseling process.
arXiv Detail & Related papers (2025-06-24T03:18:37Z) - "Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions [5.481575506447599]
Mental health is a growing global concern, prompting interest in AI-driven solutions to expand access to psychosocial support.<n>LLMs present new opportunities to enhance peer support interactions, particularly in real-time, text-based interactions.<n>We present and evaluate an AI-supported system with an LLM-simulated distressed client, context-sensitive LLM-generated suggestions, and real-time emotion visualisations.
arXiv Detail & Related papers (2025-06-11T03:06:41Z) - Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making [80.94208848596215]
We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement.<n>Inspired by the catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning.
arXiv Detail & Related papers (2025-05-27T17:59:50Z) - Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback [51.26493826461026]
We propose Psi-Arena, an interactive framework for comprehensive assessment and optimization of large language models (LLMs)<n>Arena features realistic arena interactions that simulate real-world counseling through multi-stage dialogues with psychologically profiled NPC clients.<n>Experiments across eight state-of-the-art LLMs show significant performance variations in different real-world scenarios and evaluation perspectives.
arXiv Detail & Related papers (2025-05-06T08:22:51Z) - MAGI: Multi-Agent Guided Interview for Psychiatric Assessment [50.6150986786028]
We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational navigation.<n>We show that MAGI advances LLM- assisted mental health assessment by combining clinical rigor, conversational adaptability, and explainable reasoning.
arXiv Detail & Related papers (2025-04-25T11:08:27Z) - CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration [41.03565985067749]
This paper introduces CAMI, a novel automated counselor agent grounded in Motivational Interviewing (MI)<n>We evaluate CAMI's performance through both automated and manual evaluations.<n>Results show that CAMI not only outperforms several state-of-the-art methods but also shows more realistic counselor-like behavior.
arXiv Detail & Related papers (2025-02-05T01:09:09Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [74.16170899755281]
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.<n>AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit.<n>This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.