Related papers: MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

URL: http://arxiv.org/abs/2502.17832v3
Date: Wed, 08 Oct 2025 02:51:51 GMT
Title: MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
Authors: Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, Heng Ji,
Abstract summary: Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering.<n>This reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG.
Score: 104.50239783909063
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.

Related papers

Hidden in the Metadata: Stealth Poisoning Attacks on Multimodal Retrieval-Augmented Generation [0.8103046443444949]
We present MM-MEPA, a multimodal poisoning attack that targets the metadata components of image-text entries while leaving the associated visual content unaltered.<n> MM-MEPA achieves an attack success rate of up to 91% consistently disrupting system behaviors across four retrievers and two multimodal generators.
arXiv Detail & Related papers (2026-02-26T15:59:45Z)
Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation [50.87199039334856]
Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications.<n>Recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries.<n>We introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems.
arXiv Detail & Related papers (2026-02-10T01:27:46Z)
Confundo: Learning to Generate Robust Poison for Practical RAG Systems [19.77771071590713]
Confundo is a learning-to-poison framework that fine-tunes a large language model as a poison generator to achieve high effectiveness, robustness, and stealthiness.<n>We show how Confundo consistently outperforms a wide range of purpose-built attacks across datasets and RAG configurations.<n>We also present a defensive use case that protects web content from unauthorized incorporation into RAG systems via scraping.
arXiv Detail & Related papers (2026-02-06T11:19:49Z)
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM [23.316684225491002]
We propose Spa-VLM (Stealthy Poisoning Attack on RAG-based VLM), a new paradigm for poisoning attacks on large models.<n>We craft malicious multi-modal knowledge entries, including adversarial images and misleading text, which are then injected into the RAG's knowledge base.<n>Results demonstrate that our method achieves highly stealthy poisoning, with the attack success rate exceeding 0.8.
arXiv Detail & Related papers (2025-05-28T07:44:10Z)
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs [54.90315421117162]
We propose a novel poisoning method via completely harmless data.<n>Inspired by the causal reasoning in auto-regressive LLMs, we aim to establish robust associations between triggers and an affirmative response prefix.<n>We observe an interesting resistance phenomenon where the LLM initially appears to agree but subsequently refuses to answer.
arXiv Detail & Related papers (2025-05-23T08:13:59Z)
One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems [19.179465547413848]
Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have shown improved performance in generating accurate responses.<n> dependence on external knowledge bases introduces potential security vulnerabilities.<n>This paper reveals a more realistic knowledge poisoning attack against RAG systems that achieves successful attacks by poisoning only a single document.
arXiv Detail & Related papers (2025-05-15T08:14:58Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis [3.706288937295861]
RevPRAG is a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection.<n>Our results on multiple benchmark datasets and RAG architectures show our approach could achieve 98% true positive rate, while maintaining false positive rates close to 1%.
arXiv Detail & Related papers (2024-11-28T06:29:46Z)
mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA [78.45521005703958]
multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge. We propose a novel framework called textbfRetrieval-textbfReftextbfAugmented textbfGeneration (mR$2$AG) which achieves adaptive retrieval and useful information localization. mR$2$AG significantly outperforms state-of-the-art MLLMs on INFOSEEK and Encyclopedic-VQA
arXiv Detail & Related papers (2024-11-22T16:15:50Z)
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models [18.301965456681764]
We reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG) HijackRAG enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. We propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge.
arXiv Detail & Related papers (2024-10-30T09:15:51Z)
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning. On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models [45.409248316497674]
Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG.
arXiv Detail & Related papers (2024-02-12T18:28:36Z)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities. Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content. We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z)
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks [10.732558183444985]
Malicious actors can covertly exploit large language models (LLMs) vulnerabilities through poisoning attacks aimed at generating undesirable outputs. This paper explores various poisoning techniques to assess their effectiveness across a range of generative tasks. We show that it is possible to successfully poison an LLM during the fine-tuning stage using as little as 1% of the total tuning data samples.
arXiv Detail & Related papers (2023-12-07T23:26:06Z)
On the Security Risks of Knowledge Graph Reasoning [71.64027889145261]
We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors. We present ROAR, a new class of attacks that instantiate a variety of such threats. We explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries.
arXiv Detail & Related papers (2023-05-03T18:47:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.