Prompt Injection Attacks on Large Language Models in Oncology
- URL: http://arxiv.org/abs/2407.18981v1
- Date: Tue, 23 Jul 2024 15:29:57 GMT
- Title: Prompt Injection Attacks on Large Language Models in Oncology
- Authors: Jan Clusmann, Dyke Ferber, Isabella C. Wiest, Carolin V. Schneider, Titus J. Brinker, Sebastian Foersch, Daniel Truhn, Jakob N. Kather,
- Abstract summary: Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways.
These models can be attacked by prompt injection attacks, which can be used to output harmful information just by interacting with the VLM.
We show that embedding sub-visual prompts in medical imaging data can cause the model to provide harmful output.
- Score: 1.6631057801468496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be attacked by prompt injection attacks, which can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We performed a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs which have been proposed to be of utility in healthcare: Claude 3 Opus, Claude 3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N=297 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.
Related papers
- Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings [51.73411055162861]
We introduce a safety evaluation protocol tailored to the medical domain in both patient user and clinician user perspectives.<n>This is the first work to define safety evaluation criteria for medical LLMs through targeted red-teaming taking three different points of view.
arXiv Detail & Related papers (2025-07-09T19:38:58Z) - Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language Models [7.916129615051081]
We introduce a dataset comprising over 34,000 synthetic images generated by diffusion models.<n>The dataset includes 214 human-annotated images that serve as a gold-standard reference for validation.
arXiv Detail & Related papers (2025-06-25T07:06:29Z) - Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations [13.977100716044104]
We propose a novel inference-time defense strategy to mitigate harmful queries.<n>We show that our strategy enhances model safety without significantly compromising performance.<n>We then introduce a mixed demonstration strategy as a trade-off solution for balancing security and performance.
arXiv Detail & Related papers (2025-06-08T16:26:51Z) - Transferable Adversarial Attacks on Black-Box Vision-Language Models [63.22532779621001]
adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts.<n>We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information.<n>We discover that universal perturbations -- modifications applicable to a wide set of images -- can consistently induce these misinterpretations.
arXiv Detail & Related papers (2025-05-02T06:51:11Z) - Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment [79.41098832007819]
Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems.
As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property.
We introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs.
arXiv Detail & Related papers (2025-02-04T16:04:48Z) - BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning [71.60858267608306]
Medical foundation models are susceptible to backdoor attacks.
This work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase.
Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks.
arXiv Detail & Related papers (2024-08-14T10:18:42Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - Adversarial Attacks on Large Language Models in Medicine [34.17895005922139]
The integration of Large Language Models into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care.
The susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts.
This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks.
arXiv Detail & Related papers (2024-06-18T04:24:30Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models [12.871317188671787]
We analyze the performance of various explainable AI methods on a vision-language model, MedCLIP, to demystify its inner workings.
Our work offers a different new perspective on the explainability of a recent well-known VLM in the medical domain.
arXiv Detail & Related papers (2024-03-27T20:30:01Z) - AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions [52.9787902653558]
Large Vision-Language Models (LVLMs) have shown significant progress in well responding to visual-instructions from users.
Despite the critical importance of LVLMs' robustness against such threats, current research in this area remains limited.
We introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions.
arXiv Detail & Related papers (2024-03-14T12:51:07Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Demonstration of an Adversarial Attack Against a Multimodal Vision Language Model for Pathology Imaging [1.279856000554626]
This study explores the vulnerabilities of the Pathology Language-Image Pretraining (PLIP) model, a Vision Language Foundation model, under targeted attacks.
We employ Projected Gradient Descent (PGD) adversarial perturbation attacks to induce misclassifications intentionally.
The study emphasizes the pressing need for robust defenses to ensure the reliability of AI models.
arXiv Detail & Related papers (2024-01-04T22:49:15Z) - Self-Diagnosis and Large Language Models: A New Front for Medical
Misinformation [8.738092015092207]
We evaluate the capabilities of large language models (LLMs) from the lens of a general user self-diagnosing.
We develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases.
We reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations.
arXiv Detail & Related papers (2023-07-10T21:28:26Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z) - Medical Image Understanding with Pretrained Vision Language Models: A
Comprehensive Study [8.547751745702156]
We show that well-designed medical prompts are the key to elicit knowledge from pre-trained vision language models (VLM)
We develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding.
arXiv Detail & Related papers (2022-09-30T15:06:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.