Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
- URL: http://arxiv.org/abs/2506.09067v1
- Date: Sun, 08 Jun 2025 16:26:51 GMT
- Title: Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
- Authors: Zhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani,
- Abstract summary: We propose a novel inference-time defense strategy to mitigate harmful queries.<n>We show that our strategy enhances model safety without significantly compromising performance.<n>We then introduce a mixed demonstration strategy as a trade-off solution for balancing security and performance.
- Score: 13.977100716044104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative medical vision-language models~(Med-VLMs) are primarily designed to generate complex textual information~(e.g., diagnostic reports) from multimodal inputs including vision modality~(e.g., medical images) and language modality~(e.g., clinical queries). However, their security vulnerabilities remain underexplored. Med-VLMs should be capable of rejecting harmful queries, such as \textit{Provide detailed instructions for using this CT scan for insurance fraud}. At the same time, addressing security concerns introduces the risk of over-defense, where safety-enhancing mechanisms may degrade general performance, causing Med-VLMs to reject benign clinical queries. In this paper, we propose a novel inference-time defense strategy to mitigate harmful queries, enabling defense against visual and textual jailbreak attacks. Using diverse medical imaging datasets collected from nine modalities, we demonstrate that our defense strategy based on synthetic clinical demonstrations enhances model safety without significantly compromising performance. Additionally, we find that increasing the demonstration budget alleviates the over-defense issue. We then introduce a mixed demonstration strategy as a trade-off solution for balancing security and performance under few-shot demonstration budget constraints.
Related papers
- CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs [7.597770587484936]
We introduce CARES (Clinical Adversarial Robustness and Evaluation of Safety), a benchmark for evaluating medical large language models (LLMs) safety in healthcare.<n> CARES includes over 18,000 prompts spanning eight medical safety principles, four harm levels, and four prompting styles to simulate both malicious and benign use cases.<n>Our analysis reveals that many state-of-the-art LLMs remain vulnerable to jailbreaks that subtly rephrase harmful prompts, while also over-refusing safe but atypically phrased queries.
arXiv Detail & Related papers (2025-05-16T16:25:51Z) - Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense [90.71884758066042]
Large vision-language models (LVLMs) introduce a unique vulnerability: susceptibility to malicious attacks via visual inputs.<n>We propose ESIII (Embedding Security Instructions Into Images), a novel methodology for transforming the visual space from a source of vulnerability into an active defense mechanism.
arXiv Detail & Related papers (2025-03-14T17:39:45Z) - Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety [296.5392512998251]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats.<n>We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z) - Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare [15.438265972219869]
Large language models (LLMs) are increasingly utilized in healthcare applications.<n>This study systematically assesses the vulnerabilities of seven LLMs to three advanced black-box jailbreaking techniques.
arXiv Detail & Related papers (2025-01-27T22:07:52Z) - Prompt Injection Attacks on Large Language Models in Oncology [1.6631057801468496]
Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways.
These models can be attacked by prompt injection attacks, which can be used to output harmful information just by interacting with the VLM.
We show that embedding sub-visual prompts in medical imaging data can cause the model to provide harmful output.
arXiv Detail & Related papers (2024-07-23T15:29:57Z) - Adversarial Attacks on Large Language Models in Medicine [34.17895005922139]
The integration of Large Language Models into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care.<n>The susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts.<n>This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks.
arXiv Detail & Related papers (2024-06-18T04:24:30Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models [9.860799633304298]
This paper delves into the underexplored security vulnerabilities of MedMLLMs.
By combining existing clinical medical data with atypical natural phenomena, we define the mismatched malicious attack.
We propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs.
arXiv Detail & Related papers (2024-05-26T19:11:21Z) - Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs)
We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.