MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
- URL: http://arxiv.org/abs/2403.03744v5
- Date: Wed, 09 Oct 2024 17:22:24 GMT
- Title: MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
- Authors: Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju,
- Abstract summary: We first define the notion of medical safety in large language models (LLMs) based on the Principles of Medical Ethics set forth by the American Medical Association.
We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset designed to measure the medical safety of LLMs.
Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety while preserving their medical performance.
- Score: 32.35118292932457
- License:
- Abstract: As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to evaluate and improve it. To address this gap, we first define the notion of medical safety in LLMs based on the Principles of Medical Ethics set forth by the American Medical Association. We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset designed to measure the medical safety of LLMs. We demonstrate the utility of MedSafetyBench by using it to evaluate and improve the medical safety of LLMs. Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety while preserving their medical performance. By introducing this new benchmark dataset, our work enables a systematic study of the state of medical safety in LLMs and motivates future work in this area, paving the way to mitigate the safety risks of LLMs in medicine. The benchmark dataset and code are available at https://github.com/AI4LIFE-GROUP/med-safety-bench.
Related papers
- Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine [41.71754418349046]
We propose five key principles for safe and trustworthy medical AI, along with ten specific aspects.
Under this comprehensive framework, we introduce a novel MedGuard benchmark with 1,000 expert-verified questions.
Our evaluation of 11 commonly used LLMs shows that the current language models, regardless of their safety alignment mechanisms, generally perform poorly on most of our benchmarks.
This study underscores a significant safety gap, highlighting the crucial need for human oversight and the implementation of AI safety guardrails.
arXiv Detail & Related papers (2024-11-20T06:34:32Z) - LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [80.45174785447136]
Laboratory accidents pose significant risks to human life and property.
Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices.
There is a growing concern about large language models (LLMs) for guidance in various fields.
arXiv Detail & Related papers (2024-10-18T05:21:05Z) - MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models [55.215061531495984]
"MedBench" is a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM.
First, MedBench assembles the largest evaluation dataset (300,901 questions) to cover 43 clinical specialties.
Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering.
arXiv Detail & Related papers (2024-06-24T02:25:48Z) - Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models [9.860799633304298]
This paper delves into the underexplored security vulnerabilities of MedMLLMs.
By combining existing clinical medical data with atypical natural phenomena, we define the mismatched malicious attack.
We propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs.
arXiv Detail & Related papers (2024-05-26T19:11:21Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z) - Medical Foundation Models are Susceptible to Targeted Misinformation
Attacks [3.252906830953028]
Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains.
We demonstrate a concerning vulnerability of LLMs in medicine through targeted manipulation of just 1.1% of the model's weights.
We validate our findings in a set of 1,038 incorrect biomedical facts.
arXiv Detail & Related papers (2023-09-29T06:44:36Z) - SafetyBench: Evaluating the Safety of Large Language Models [54.878612385780805]
SafetyBench is a comprehensive benchmark for evaluating the safety of Large Language Models (LLMs)
It comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns.
Our tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts.
arXiv Detail & Related papers (2023-09-13T15:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.