Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
- URL: http://arxiv.org/abs/2511.21757v1
- Date: Mon, 24 Nov 2025 11:55:22 GMT
- Title: Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
- Authors: Andrew Maranhão Ventura D'addario,
- Abstract summary: This work advocates for a shift from universal to context-aware safety.<n>It provides the necessary resources to immunize AI against the nuanced, systemic threats inherent to high-stakes medical environments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the ethical design of releasing these "vulnerability signatures" to correct the information asymmetry between malicious actors and AI developers. Ultimately, this work advocates for a shift from universal to context-aware safety, providing the necessary resources to immunize healthcare AI against the nuanced, systemic threats inherent to high-stakes medical environments -- vulnerabilities that represent the paramount risk to patient safety and the successful integration of AI in healthcare systems.
Related papers
- Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context [82.32380418146656]
Health-ORSC-Bench is the first large-scale benchmark designed to measure textbfOver-Refusal and textbfSafe Completion quality in healthcare.<n>Our framework uses an automated pipeline with human validation to test models at varying levels of intent ambiguity.<n>Health-ORSC-Bench provides a rigorous standard for calibrating the next generation of medical AI assistants.
arXiv Detail & Related papers (2026-01-25T01:28:52Z) - INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems [70.37731999972785]
In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category.<n>During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity.
arXiv Detail & Related papers (2026-01-21T05:27:08Z) - A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties [11.500745861209774]
Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties.<n>Existing security benchmarks require GPU clusters, commercial API access, or protected health data.<n>We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints.
arXiv Detail & Related papers (2025-12-09T02:28:15Z) - Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis [39.89241412792336]
We analyzed eight attack scenarios in four categories: architectural attacks on convolutional neural networks, large language models, and reinforcement learning agents.<n>Our findings indicate that attackers with access to only 100-500 samples can compromise healthcare AI regardless of dataset size.<n>We recommend multilayer defenses including required adversarial testing, ensemble-based detection, privacy-preserving security mechanisms, and international coordination on AI security standards.
arXiv Detail & Related papers (2025-11-14T07:16:16Z) - The Open Syndrome Definition [61.0983330391914]
We propose the first open, machine-readable format for representing case and syndrome definitions.<n>We introduce the first comprehensive dataset of standardized case definitions and tools to convert existing human-readable definitions into machine-readable formats.<n>The Open Syndrome Definition format enables consistent, scalable use of case definitions across systems, unlocking AI's potential to strengthen public health preparedness and response.
arXiv Detail & Related papers (2025-09-29T19:41:54Z) - How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System [21.40560864239872]
We propose MedThreatRAG, a novel framework that probes vulnerabilities in medical RAG systems.<n>A key innovation of our approach is the construction of a simulated semi-open attack environment.<n>We show that MedThreatRAG reduces answer F1 scores by up to 27.66% and lowers LLaVA-Med-1.5 F1 rates to as low as 51.36%.
arXiv Detail & Related papers (2025-08-24T05:11:09Z) - Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z) - MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems [24.60202452646343]
We introduce MedSentry, a benchmark 5 000 adversarial medical prompts spanning 25 categories with 100 subthemes.<n>We develop an end-to-end attack-defense evaluation pipeline to analyze how four representative multi-agent topologies withstand attacks from 'dark-personality' agents.
arXiv Detail & Related papers (2025-05-27T07:34:40Z) - An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity.<n>We focus on technical approaches to misuse and misalignment.<n>We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z) - Safety challenges of AI in medicine in the era of large language models [23.817939398729955]
Large language models (LLMs) offer new opportunities for medical practitioners, patients, and researchers.<n>As AI and LLMs become more powerful and especially achieve superhuman performance in some medical tasks, public concerns over their safety have intensified.<n>This review examines emerging risks in AI utilization during the LLM era.
arXiv Detail & Related papers (2024-09-11T13:47:47Z) - The Need for Guardrails with Large Language Models in Medical Safety-Critical Settings: An Artificial Intelligence Application in the Pharmacovigilance Ecosystem [0.6965384453064829]
Large language models (LLMs) are useful tools with the capacity for performing specific types of knowledge work at an effective scale.
However, deployments in high-risk and safety-critical domains pose unique challenges, notably the issue of hallucinations.
This is particularly concerning in settings such as drug safety, where inaccuracies could lead to patient harm.
We have developed and demonstrated a proof of concept suite of guardrails specifically designed to mitigate certain types of hallucinations and errors for drug safety.
arXiv Detail & Related papers (2024-07-01T19:52:41Z) - COVI White Paper [67.04578448931741]
Contact tracing is an essential tool to change the course of the Covid-19 pandemic.
We present an overview of the rationale, design, ethical considerations and privacy strategy of COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
arXiv Detail & Related papers (2020-05-18T07:40:49Z) - Digital Ariadne: Citizen Empowerment for Epidemic Control [55.41644538483948]
The COVID-19 crisis represents the most dangerous threat to public health since the H1N1 pandemic of 1918.
Technology-assisted location and contact tracing, if broadly adopted, may help limit the spread of infectious diseases.
We present a tool, called 'diAry' or 'digital Ariadne', based on voluntary location and Bluetooth tracking on personal devices.
arXiv Detail & Related papers (2020-04-16T15:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.