Related papers: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

URL: http://arxiv.org/abs/2405.20775v1
Date: Sun, 26 May 2024 19:11:21 GMT
Title: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
Authors: Xijie Huang, Xinyuan Wang, Hantao Zhang, Jiawen Xi, Jingkun An, Hao Wang, Chengwei Pan,
Abstract summary: This paper delves into the underexplored security vulnerabilities of MedMLLMs. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks. Evaluations with this dataset and novel attack methods, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches.
Score: 8.398342612100574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks: mismatched malicious attack (2M-attack) and optimized mismatched malicious attack (O2M-attack). Using our own constructed voluminous 3MAD dataset, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and novel attack methods, including white-box attacks on LLaVA-Med and transfer attacks on four other state-of-the-art models, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. For further research and replication, anonymous access to our code is available at https://github.com/dirtycomputer/O2M_attack. Warning: Medical large model jailbreaking may generate content that includes unverified diagnoses and treatment recommendations. Always consult professional medical advice.

Related papers

Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS [12.649568006596956]
Large language model-based multi-agent systems (LLM-MAS) effectively accomplish complex and dynamic tasks through inter-agent communication.<n>Existing attack methods targeting LLM-MAS either compromise agent internals or rely on direct and overt persuasion.<n>We propose MAST, a Multi-round Adaptive Stealthy Tampering framework designed to exploit communication vulnerabilities within the system.
arXiv Detail & Related papers (2025-08-05T06:14:53Z)
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations [13.977100716044104]
We propose a novel inference-time defense strategy to mitigate harmful queries.<n>We show that our strategy enhances model safety without significantly compromising performance.<n>We then introduce a mixed demonstration strategy as a trade-off solution for balancing security and performance.
arXiv Detail & Related papers (2025-06-08T16:26:51Z)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models [83.80177564873094]
We propose a unified multimodal universal jailbreak attack framework.<n>We evaluate the undesirable context generation of MLLMs like LLaVA, Yi-VL, MiniGPT4, MiniGPT-v2, and InstructBLIP.<n>This study underscores the urgent need for robust safety measures in MLLMs.
arXiv Detail & Related papers (2025-06-02T04:33:56Z)
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment. Certain scenarios suffer 25 times higher attack rates. Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z)
Towards Harmless Multimodal Assistants with Blind Preference Optimization [49.044737689613164]
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. Due to the effectiveness of preference optimization in aligning MLLMs with human preferences, there is an urgent need for safety-related preference data for MLLMs. We construct the MMSafe-PO preference dataset towards harmless multimodal assistants, featuring multimodal instructions, the conversational format, and ranked paired responses from human feedback.
arXiv Detail & Related papers (2025-03-18T12:02:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG) RAG enhances MLLMs by grounding responses in query-relevant external knowledge. This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks. We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment [79.41098832007819]
Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. We introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs.
arXiv Detail & Related papers (2025-02-04T16:04:48Z)
Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare [15.438265972219869]
Large language models (LLMs) are increasingly utilized in healthcare applications. This study systematically assesses the vulnerabilities of seven LLMs to three advanced black-box jailbreaking techniques.
arXiv Detail & Related papers (2025-01-27T22:07:52Z)
Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks. ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z)
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration [90.36429361299807]
multimodal large language models (MLLMs) have demonstrated remarkable success in engaging in conversations involving visual inputs. The integration of visual modality has introduced a unique vulnerability: the MLLM becomes susceptible to malicious visual inputs. We introduce a technique termed CoCA, which amplifies the safety-awareness of the MLLM by calibrating its output distribution.
arXiv Detail & Related papers (2024-09-17T17:14:41Z)
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models [17.663550432103534]
Multimodal Large Language Models (MLLMs) extend the capacity of LLMs to understand multimodal information comprehensively. These models are susceptible to jailbreak attacks, where malicious users can break the safety alignment of the target model and generate misleading and harmful answers. We propose Cross-modality Information DEtectoR (CIDER), a plug-and-play jailbreaking detector designed to identify maliciously perturbed image inputs.
arXiv Detail & Related papers (2024-07-31T15:02:46Z)
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z)
Adversarial Attacks on Large Language Models in Medicine [34.17895005922139]
The integration of Large Language Models into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. The susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks.
arXiv Detail & Related papers (2024-06-18T04:24:30Z)
Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes [0.0]
Large Language Models (LLMs) have gained widespread adoption across various domains, including chatbots and auto-task completion agents. These models are susceptible to safety vulnerabilities such as jailbreaking, prompt injection, and privacy leakage attacks. This study investigates the impact of these modifications on LLM safety, a critical consideration for building reliable and secure AI systems.
arXiv Detail & Related papers (2024-04-05T20:31:45Z)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI) Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z)
Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models [59.60384461302662]
We introduce Asclepius, a novel benchmark for evaluating Medical Multi-Modal Large Language Models (Med-MLLMs) Asclepius rigorously and comprehensively assesses model capability in terms of distinct medical specialties and different diagnostic capacities. We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z)
Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models [41.708401515627784]
We observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images. We introduce MM-SafetyBench, a framework designed for conducting safety-critical evaluations of MLLMs against such image-based manipulations. Our work underscores the need for a concerted effort to strengthen and enhance the safety measures of open-source MLLMs against potential malicious exploits.
arXiv Detail & Related papers (2023-11-29T12:49:45Z)
Medical Foundation Models are Susceptible to Targeted Misinformation Attacks [3.252906830953028]
Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains. We demonstrate a concerning vulnerability of LLMs in medicine through targeted manipulation of just 1.1% of the model's weights. We validate our findings in a set of 1,038 incorrect biomedical facts.
arXiv Detail & Related papers (2023-09-29T06:44:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.