Towards Assessing Medical Ethics from Knowledge to Practice
- URL: http://arxiv.org/abs/2508.05132v1
- Date: Thu, 07 Aug 2025 08:10:14 GMT
- Title: Towards Assessing Medical Ethics from Knowledge to Practice
- Authors: Chang Hong, Minghao Wu, Qingying Xiao, Yuchi Wang, Xiang Wan, Guangjun Yu, Benyou Wang, Yan Hu,
- Abstract summary: We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions.<n>This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature.<n>Our experiments reveal a significant gap between models' ethical knowledge and their practical application.
- Score: 30.668836248264757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of large language models into healthcare necessitates a rigorous evaluation of their ethical reasoning, an area current benchmarks often overlook. We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions designed to systematically assess LLMs' alignment with core medical ethics. Grounded in Principlism, our benchmark features a high-quality dataset. This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature, all validated by medical experts. Our experiments reveal a significant gap between models' ethical knowledge and their practical application, especially in dynamically applying ethical principles to real-world scenarios. Most LLMs struggle with dilemmas concerning Beneficence, often over-emphasizing other principles. Frontier closed-source models, driven by strong general capabilities, currently lead the benchmark. Notably, medical domain fine-tuning can enhance models' overall ethical competence, but further progress requires better alignment with medical ethical knowledge. PrinciplismQA offers a scalable framework to diagnose these specific ethical weaknesses, paving the way for more balanced and responsible medical AI.
Related papers
- Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs [18.92960063905292]
This paper introduces $textbfMedEthicsQA$, a comprehensive benchmark comprising $textbf5,623$ multiple-choice questions and $textbf5,351$ open-ended questions for evaluation of medical ethics in LLMs.<n>We systematically establish a hierarchical taxonomy integrating global medical ethical standards. The benchmark encompasses widely used medical datasets, authoritative question banks, and scenarios derived from literature.
arXiv Detail & Related papers (2025-06-28T08:21:35Z) - Ethical AI in the Healthcare Sector: Investigating Key Drivers of Adoption through the Multi-Dimensional Ethical AI Adoption Model (MEAAM) [1.5458951336481048]
This study introduces the Multi-Dimensional Ethical AI Adoption Model (MEAAM)<n>It categorizes 13 critical ethical variables across four foundational dimensions of Ethical AI Fair AI, Responsible AI, Explainable AI, and Sustainable AI.<n>It investigates the influence of these ethical constructs on two outcomes Operational AI Adoption and Systemic AI Adoption.
arXiv Detail & Related papers (2025-05-04T10:40:05Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics [30.129774371246086]
This paper introduces MedEthicEval, a novel benchmark designed to evaluate large language models (LLMs) in the domain of medical ethics.<n>Our framework encompasses two key components: knowledge, assessing the models' grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios.
arXiv Detail & Related papers (2025-03-04T08:01:34Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) [0.0]
ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare.
Despite their potential benefits, researchers have underscored various ethical implications.
This work aims to map the ethical landscape surrounding the current stage of deployment of LLMs in medicine and healthcare.
arXiv Detail & Related papers (2024-03-21T15:20:07Z) - Towards A Unified Utilitarian Ethics Framework for Healthcare Artificial
Intelligence [0.08192907805418582]
This study attempts to identify the major ethical principles influencing the utility performance of AI at different technological levels.
Justice, privacy, bias, lack of regulations, risks, and interpretability are the most important principles to consider for ethical AI.
We propose a new utilitarian ethics-based theoretical framework for designing ethical AI for the healthcare domain.
arXiv Detail & Related papers (2023-09-26T02:10:58Z) - Case Study: Deontological Ethics in NLP [119.53038547411062]
We study one ethical theory, namely deontological ethics, from the perspective of NLP.
In particular, we focus on the generalization principle and the respect for autonomy through informed consent.
We provide four case studies to demonstrate how these principles can be used with NLP systems.
arXiv Detail & Related papers (2020-10-09T16:04:51Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.