A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties
- URL: http://arxiv.org/abs/2512.08185v1
- Date: Tue, 09 Dec 2025 02:28:15 GMT
- Title: A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties
- Authors: Jinghao Wang, Ping Zhang, Carter Yagemann,
- Abstract summary: Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties.<n>Existing security benchmarks require GPU clusters, commercial API access, or protected health data.<n>We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints.
- Score: 11.500745861209774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.
Related papers
- Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming [23.573537738272595]
We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with cognitive-affective models.<n>We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents.<n>Our large-scale simulation reveals critical safety gaps in the use of AI for mental health support.
arXiv Detail & Related papers (2026-02-23T15:17:18Z) - Responsible Evaluation of AI for Mental Health [72.85175110624736]
Current approaches to evaluating AI tools in mental health care are fragmented and poorly aligned with clinical practice, social context, and first-hand user experience.<n>This paper argues for a rethinking of responsible evaluation by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity.
arXiv Detail & Related papers (2026-01-20T12:55:10Z) - Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions [3.36168223686933]
Retrieval-augmented generation (RAG) has rapidly emerged as a transformative approach for integrating large language models into clinical and biomedical systems.<n>This review provides a thorough analysis of the current landscape of RAG applications in healthcare.
arXiv Detail & Related papers (2025-11-14T14:33:58Z) - Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z) - Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings [48.096652370210016]
We introduce a safety evaluation protocol tailored to the medical domain in both patient user and clinician user perspectives.<n>This is the first work to define safety evaluation criteria for medical LLMs through targeted red-teaming taking three different points of view.
arXiv Detail & Related papers (2025-07-09T19:38:58Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare [15.438265972219869]
Large language models (LLMs) are increasingly utilized in healthcare applications.<n>This study systematically assesses the vulnerabilities of seven LLMs to three advanced black-box jailbreaking techniques.
arXiv Detail & Related papers (2025-01-27T22:07:52Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - RAISE -- Radiology AI Safety, an End-to-end lifecycle approach [5.829180249228172]
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency.
The focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy.
The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
arXiv Detail & Related papers (2023-11-24T15:59:14Z) - MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence
using Federated Evaluation [110.31526448744096]
We argue that unlocking this potential requires a systematic way to measure the performance of medical AI models on large-scale heterogeneous data.
We are building MedPerf, an open framework for benchmarking machine learning in the medical domain.
arXiv Detail & Related papers (2021-09-29T18:09:41Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.