EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI
- URL: http://arxiv.org/abs/2509.11648v1
- Date: Mon, 15 Sep 2025 07:35:35 GMT
- Title: EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI
- Authors: Sai Kartheek Reddy Kasu,
- Abstract summary: This dataset is a pilot dataset of 125 scenarios designed to evaluate how AI systems navigate ethically charged situations.<n>EthicsMH establishes a task framework that bridges AI ethics and mental health decision-making.
- Score: 0.2538209532048867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The deployment of large language models (LLMs) in mental health and other sensitive domains raises urgent questions about ethical reasoning, fairness, and responsible alignment. Yet, existing benchmarks for moral and clinical decision-making do not adequately capture the unique ethical dilemmas encountered in mental health practice, where confidentiality, autonomy, beneficence, and bias frequently intersect. To address this gap, we introduce Ethical Reasoning in Mental Health (EthicsMH), a pilot dataset of 125 scenarios designed to evaluate how AI systems navigate ethically charged situations in therapeutic and psychiatric contexts. Each scenario is enriched with structured fields, including multiple decision options, expert-aligned reasoning, expected model behavior, real-world impact, and multi-stakeholder viewpoints. This structure enables evaluation not only of decision accuracy but also of explanation quality and alignment with professional norms. Although modest in scale and developed with model-assisted generation, EthicsMH establishes a task framework that bridges AI ethics and mental health decision-making. By releasing this dataset, we aim to provide a seed resource that can be expanded through community and expert contributions, fostering the development of AI systems capable of responsibly handling some of society's most delicate decisions.
Related papers
- Mirror: A Multi-Agent System for AI-Assisted Ethics Review [104.3684024153469]
Mirror is an agentic framework for AI-assisted ethical review.<n>It integrates ethical reasoning, structured rule interpretation, and multi-agent deliberation within a unified architecture.
arXiv Detail & Related papers (2026-02-09T03:38:55Z) - Responsible Evaluation of AI for Mental Health [72.85175110624736]
Current approaches to evaluating AI tools in mental health care are fragmented and poorly aligned with clinical practice, social context, and first-hand user experience.<n>This paper argues for a rethinking of responsible evaluation by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity.
arXiv Detail & Related papers (2026-01-20T12:55:10Z) - PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics [35.52940216380734]
In mental health, clinically inadequate refusals can be perceived as unempathetic and discourage help-seeking.<n>To address this gap, we move beyond refusal-centric metrics and introduce textttPsychEthicsBench, the first principle-grounded benchmark based on Australian psychology and psychiatry guidelines.<n> Empirical results across 14 models reveal that refusal rates are poor indicators of ethical behavior, revealing a significant divergence between safety triggers and clinical appropriateness.
arXiv Detail & Related papers (2026-01-07T04:49:02Z) - A Comprehensive Review of Datasets for Clinical Mental Health AI Systems [55.67299586253951]
We present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants.<n>Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data.
arXiv Detail & Related papers (2025-08-13T13:42:35Z) - Towards Assessing Medical Ethics from Knowledge to Practice [30.668836248264757]
We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions.<n>This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature.<n>Our experiments reveal a significant gap between models' ethical knowledge and their practical application.
arXiv Detail & Related papers (2025-08-07T08:10:14Z) - Ethical AI in the Healthcare Sector: Investigating Key Drivers of Adoption through the Multi-Dimensional Ethical AI Adoption Model (MEAAM) [1.5458951336481048]
This study introduces the Multi-Dimensional Ethical AI Adoption Model (MEAAM)<n>It categorizes 13 critical ethical variables across four foundational dimensions of Ethical AI Fair AI, Responsible AI, Explainable AI, and Sustainable AI.<n>It investigates the influence of these ethical constructs on two outcomes Operational AI Adoption and Systemic AI Adoption.
arXiv Detail & Related papers (2025-05-04T10:40:05Z) - Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities [58.61680631581921]
Mental health disorders create profound personal and societal burdens, yet conventional diagnostics are resource-intensive and limit accessibility.<n>This paper examines these challenges and proposes solutions, including anonymization, synthetic data, and privacy-preserving training.<n>It aims to advance reliable, privacy-aware AI tools that support clinical decision-making and improve mental health outcomes.
arXiv Detail & Related papers (2025-02-01T15:10:02Z) - No General Code of Ethics for All: Ethical Considerations in Human-bot Psycho-counseling [16.323742994936584]
We propose aspirational ethical principles specifically tailored for human-bot psycho-counseling.
We examined the responses generated by EVA2.0, GPT-3.5, and GPT-4.0 in the context of psycho-counseling and mental health inquiries.
arXiv Detail & Related papers (2024-04-22T10:29:04Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - Towards A Unified Utilitarian Ethics Framework for Healthcare Artificial
Intelligence [0.08192907805418582]
This study attempts to identify the major ethical principles influencing the utility performance of AI at different technological levels.
Justice, privacy, bias, lack of regulations, risks, and interpretability are the most important principles to consider for ethical AI.
We propose a new utilitarian ethics-based theoretical framework for designing ethical AI for the healthcare domain.
arXiv Detail & Related papers (2023-09-26T02:10:58Z) - Ethics in conversation: Building an ethics assurance case for autonomous
AI-enabled voice agents in healthcare [1.8964739087256175]
The principles-based ethics assurance argument pattern is one proposal in the AI ethics landscape.
This paper presents the interim findings of a case study applying this ethics assurance framework to the use of Dora, an AI-based telemedicine system.
arXiv Detail & Related papers (2023-05-23T16:04:59Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.