Related papers: MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics

MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics

URL: http://arxiv.org/abs/2503.02374v1
Date: Tue, 04 Mar 2025 08:01:34 GMT
Title: MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics
Authors: Haoan Jin, Jiacheng Shi, Hanhui Xu, Kenny Q. Zhu, Mengyue Wu,
Abstract summary: This paper introduces MedEthicEval, a novel benchmark designed to evaluate large language models (LLMs) in the domain of medical ethics.<n>Our framework encompasses two key components: knowledge, assessing the models' grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios.
Score: 30.129774371246086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate significant potential in advancing medical applications, yet their capabilities in addressing medical ethics challenges remain underexplored. This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. Our framework encompasses two key components: knowledge, assessing the models' grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios. To support this benchmark, we consulted with medical ethics researchers and developed three datasets addressing distinct ethical challenges: blatant violations of medical ethics, priority dilemmas with clear inclinations, and equilibrium dilemmas without obvious resolutions. MedEthicEval serves as a critical tool for understanding LLMs' ethical reasoning in healthcare, paving the way for their responsible and effective use in medical contexts.

Related papers

Towards Assessing Medical Ethics from Knowledge to Practice [30.668836248264757]
We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions.<n>This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature.<n>Our experiments reveal a significant gap between models' ethical knowledge and their practical application.
arXiv Detail & Related papers (2025-08-07T08:10:14Z)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
Ethical AI in the Healthcare Sector: Investigating Key Drivers of Adoption through the Multi-Dimensional Ethical AI Adoption Model (MEAAM) [1.5458951336481048]
This study introduces the Multi-Dimensional Ethical AI Adoption Model (MEAAM)<n>It categorizes 13 critical ethical variables across four foundational dimensions of Ethical AI Fair AI, Responsible AI, Explainable AI, and Sustainable AI.<n>It investigates the influence of these ethical constructs on two outcomes Operational AI Adoption and Systemic AI Adoption.
arXiv Detail & Related papers (2025-05-04T10:40:05Z)
Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns. Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance. We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z)
Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models [0.0]
Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context.
arXiv Detail & Related papers (2024-12-20T10:06:52Z)
MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.<n>It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.<n> Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z)
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios [50.032101237019205]
CliMedBench is a comprehensive benchmark with 14 expert-guided core clinical scenarios. The reliability of this benchmark has been confirmed in several ways.
arXiv Detail & Related papers (2024-10-04T15:15:36Z)
The Role of Language Models in Modern Healthcare: A Comprehensive Review [2.048226951354646]
The application of large language models (LLMs) in healthcare has gained significant attention. This review examines the trajectory of language models from their early stages to the current state-of-the-art LLMs.
arXiv Detail & Related papers (2024-09-25T12:15:15Z)
Introducing ELLIPS: An Ethics-Centered Approach to Research on LLM-Based Inference of Psychiatric Conditions [0.6174527525452624]
This paper charts the ethical landscape of research on language-based inference of psychopathology. We identify seven core ethical principles that should guide model development and deployment. We translate these principles into questions that can guide researchers' choices.
arXiv Detail & Related papers (2024-09-06T12:27:38Z)
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians. Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z)
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions [23.36640449085249]
We trace the recent advances of Medical Large Language Models (Med-LLMs)<n>The wide-ranging applications of Med-LLMs are investigated across various healthcare domains.<n>We discuss the challenges associated with ensuring fairness, accountability, privacy, and robustness.
arXiv Detail & Related papers (2024-06-06T03:15:13Z)
A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models [57.88111980149541]
We introduce Asclepius, a novel Med-MLLM benchmark that assesses Med-MLLMs in terms of distinct medical specialties and different diagnostic capacities.<n>Grounded in 3 proposed core principles, Asclepius ensures a comprehensive evaluation by encompassing 15 medical specialties.<n>We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 3 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z)
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models [56.36916128631784]
We introduce MedBench, a comprehensive benchmark for the Chinese medical domain. This benchmark is composed of four key components: the Chinese Medical Licensing Examination, the Resident Standardization Training Examination, and real-world clinic cases. We perform extensive experiments and conduct an in-depth analysis from diverse perspectives, which culminate in the following findings.
arXiv Detail & Related papers (2023-12-20T07:01:49Z)
NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs [13.090847961966679]
We propose a set of guiding principles for the use of NLP in maternal healthcare. We surveyed healthcare workers and birthing people about their values, needs, and perceptions of NLP tools. For each principle, we describe its underlying rationale and provide practical advice.
arXiv Detail & Related papers (2023-12-19T02:35:13Z)
Towards A Unified Utilitarian Ethics Framework for Healthcare Artificial Intelligence [0.08192907805418582]
This study attempts to identify the major ethical principles influencing the utility performance of AI at different technological levels. Justice, privacy, bias, lack of regulations, risks, and interpretability are the most important principles to consider for ethical AI. We propose a new utilitarian ethics-based theoretical framework for designing ethical AI for the healthcare domain.
arXiv Detail & Related papers (2023-09-26T02:10:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.