How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System
- URL: http://arxiv.org/abs/2508.17215v1
- Date: Sun, 24 Aug 2025 05:11:09 GMT
- Title: How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System
- Authors: Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Yeming Wang, Fan Mo, Pietro Liò,
- Abstract summary: We propose MedThreatRAG, a novel framework that probes vulnerabilities in medical RAG systems.<n>A key innovation of our approach is the construction of a simulated semi-open attack environment.<n>We show that MedThreatRAG reduces answer F1 scores by up to 27.66% and lowers LLaVA-Med-1.5 F1 rates to as low as 51.36%.
- Score: 21.40560864239872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Vision-Language Models (LVLMs) augmented with Retrieval-Augmented Generation (RAG) are increasingly employed in medical AI to enhance factual grounding through external clinical image-text retrieval. However, this reliance creates a significant attack surface. We propose MedThreatRAG, a novel multimodal poisoning framework that systematically probes vulnerabilities in medical RAG systems by injecting adversarial image-text pairs. A key innovation of our approach is the construction of a simulated semi-open attack environment, mimicking real-world medical systems that permit periodic knowledge base updates via user or pipeline contributions. Within this setting, we introduce and emphasize Cross-Modal Conflict Injection (CMCI), which embeds subtle semantic contradictions between medical images and their paired reports. These mismatches degrade retrieval and generation by disrupting cross-modal alignment while remaining sufficiently plausible to evade conventional filters. While basic textual and visual attacks are included for completeness, CMCI demonstrates the most severe degradation. Evaluations on IU-Xray and MIMIC-CXR QA tasks show that MedThreatRAG reduces answer F1 scores by up to 27.66% and lowers LLaVA-Med-1.5 F1 rates to as low as 51.36%. Our findings expose fundamental security gaps in clinical RAG systems and highlight the urgent need for threat-aware design and robust multimodal consistency checks. Finally, we conclude with a concise set of guidelines to inform the safe development of future multimodal medical RAG systems.
Related papers
- Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation [27.594129219205954]
multimodal medical retrieval-augmented generation (MMed-RAG) systems are increasingly adopted in clinical decision support.<n>We propose Medusa, a novel framework for crafting cross-modal transferable adversarial attacks on MMed-RAG systems.<n>Our results reveal critical vulnerabilities in the MMed-RAG systems and highlight the necessity of benchmarking robustness in safety-critical medical applications.
arXiv Detail & Related papers (2025-11-24T16:11:01Z) - MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z) - MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation [63.38824041721275]
Low-resolution (LR) medical videos present unique challenges for video super-resolution (VSR) models.<n>We propose MedVSR, a tailored framework for medical VSR.<n>We show that MedVSR significantly outperforms existing VSR models in reconstruction performance and efficiency.
arXiv Detail & Related papers (2025-09-25T14:56:59Z) - Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models [6.169451756799087]
There is an urgent need for a unified multimodal AI framework that can proactively predict individual health risks.<n>We propose VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer with a large language model (LLM) inference head embedded in its top layer.
arXiv Detail & Related papers (2025-09-22T05:26:59Z) - MIRA: A Novel Framework for Fusing Modalities in Medical RAG [6.044279952668295]
We introduce the Multimodal Intelligent Retrieval and Augmentation (MIRA) framework, designed to optimize factual accuracy in MLLM.<n>MIRA consists of two key components: (1) a calibrated Rethinking and Rearrangement module that dynamically adjusts the number of retrieved contexts to manage factual risk, and (2) A medical RAG framework integrating image embeddings and a medical knowledge base with a query-rewrite module for efficient multimodal reasoning.
arXiv Detail & Related papers (2025-07-10T16:33:50Z) - MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems [24.60202452646343]
We introduce MedSentry, a benchmark 5 000 adversarial medical prompts spanning 25 categories with 100 subthemes.<n>We develop an end-to-end attack-defense evaluation pipeline to analyze how four representative multi-agent topologies withstand attacks from 'dark-personality' agents.
arXiv Detail & Related papers (2025-05-27T07:34:40Z) - CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models [1.7042756021131187]
This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system.<n>CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification.<n>RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports.
arXiv Detail & Related papers (2025-04-29T16:14:55Z) - Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs)<n>In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z) - MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [104.50239783909063]
Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering.<n>This reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG.
arXiv Detail & Related papers (2025-02-25T04:23:59Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - MedViT: A Robust Vision Transformer for Generalized Medical Image
Classification [4.471084427623774]
We propose a robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs and the global connectivity of vision Transformers.
Our proposed hybrid model demonstrates its high robustness and generalization ability compared to the state-of-the-art studies on a large-scale collection of standardized MedMNIST-2D datasets.
arXiv Detail & Related papers (2023-02-19T02:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.