Adversarial Attacks on Large Language Models in Medicine
- URL: http://arxiv.org/abs/2406.12259v1
- Date: Tue, 18 Jun 2024 04:24:30 GMT
- Title: Adversarial Attacks on Large Language Models in Medicine
- Authors: Yifan Yang, Qiao Jin, Furong Huang, Zhiyong Lu,
- Abstract summary: The integration of Large Language Models into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care.
The susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts.
This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks.
- Score: 34.17895005922139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks. Utilizing real-world patient data, we demonstrate that both open-source and proprietary LLMs are susceptible to manipulation across multiple tasks. This research further reveals that domain-specific tasks demand more adversarial data in model fine-tuning than general domain tasks for effective attack execution, especially for more capable models. We discover that while integrating adversarial data does not markedly degrade overall model performance on medical benchmarks, it does lead to noticeable shifts in fine-tuned model weights, suggesting a potential pathway for detecting and countering model attacks. This research highlights the urgent need for robust security measures and the development of defensive mechanisms to safeguard LLMs in medical applications, to ensure their safe and effective deployment in healthcare settings.
Related papers
- Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.
LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model.
We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models [9.860799633304298]
This paper delves into the underexplored security vulnerabilities of MedMLLMs.
By combining existing clinical medical data with atypical natural phenomena, we define the mismatched malicious attack.
We propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs.
arXiv Detail & Related papers (2024-05-26T19:11:21Z) - Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [18.624280305864804]
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP)
This paper presents a comprehensive survey of the various forms of attacks targeting LLMs.
We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.
arXiv Detail & Related papers (2024-03-03T04:46:21Z) - Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks [0.6282171844772422]
An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications.
The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs.
We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
arXiv Detail & Related papers (2024-02-16T09:29:38Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Forcing Generative Models to Degenerate Ones: The Power of Data
Poisoning Attacks [10.732558183444985]
Malicious actors can covertly exploit large language models (LLMs) vulnerabilities through poisoning attacks aimed at generating undesirable outputs.
This paper explores various poisoning techniques to assess their effectiveness across a range of generative tasks.
We show that it is possible to successfully poison an LLM during the fine-tuning stage using as little as 1% of the total tuning data samples.
arXiv Detail & Related papers (2023-12-07T23:26:06Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z) - On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation.
Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.