Related papers: MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

URL: http://arxiv.org/abs/2512.08289v1
Date: Tue, 09 Dec 2025 06:38:16 GMT
Title: MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks
Authors: Tailun Chen, Yu He, Yan Wang, Shuo Shao, Haolun Zheng, Zhihao Liu, Jinfeng Li, Yuefeng Chen, Zhixuan Chu, Zhan Qin,
Abstract summary: Retrieval-Augmented Generation (RAG) systems introduce a critical attack surface: corpus poisoning.<n>We propose MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments.<n>Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness.
Score: 47.46936341268548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While recent studies have demonstrated the potential of such attacks, they typically rely on impractical assumptions, such as white-box access or known user queries, thereby underestimating the difficulty of real-world exploitation. In this paper, we bridge this gap by proposing MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments. Operating on surrogate model feedback, MIRAGE functions as an automated optimization framework that integrates three key mechanisms: it utilizes persona-driven query synthesis to approximate latent user search distributions, employs semantic anchoring to imperceptibly embed these intents for high retrieval visibility, and leverages an adversarial variant of Test-Time Preference Optimization (TPO) to maximize persuasion. To rigorously evaluate this threat, we construct a new benchmark derived from three long-form, domain-specific datasets. Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness, exhibiting remarkable transferability across diverse retriever-LLM configurations and highlighting the urgent need for robust defense strategies.

Related papers

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation [50.87199039334856]
Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications.<n>Recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries.<n>We introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems.
arXiv Detail & Related papers (2026-02-10T01:27:46Z)
Rethinking On-Device LLM Reasoning: Why Analogical Mapping Outperforms Abstract Thinking for IoT DDoS Detection [8.874341026403854]
This paper introduces a novel detection framework that integrates Chain-of-Thought (CoT) reasoning with Retrieval-Augmented Generation (RAG)<n>We systematically evaluate compact ODLLMs, including LLaMA 3.2 (1B, 3B) and Gemma 3 (1B, 4B), using structured prompting and exemplar-driven reasoning strategies.<n> Experimental results demonstrate substantial performance improvements with few-shot prompting, achieving macro-average F1 scores as high as 0.85.
arXiv Detail & Related papers (2026-01-20T15:18:56Z)
External Data Extraction Attacks against Retrieval-Augmented Large Language Models [70.47869786522782]
RAG has emerged as a key paradigm for enhancing large language models (LLMs)<n>RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim.<n>We present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs.
arXiv Detail & Related papers (2025-10-03T12:53:45Z)
Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation [7.441679541836913]
Token-level Precise Attack on the RAG (TPARAG) is a novel framework that targets both white-box and black-box RAG systems.<n>TPARAG consistently outperforms previous approaches in retrieval-stage and end-to-end attack effectiveness.
arXiv Detail & Related papers (2025-08-05T05:44:19Z)
MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems [31.53306157650065]
Multimodal retrieval-augmented generation (RAG) systems enhance large vision-language models by integrating cross-modal knowledge.<n>These knowledge databases may contain sensitive information that requires privacy protection.<n>MrM is the first black-box MIA framework targeted at multimodal RAG systems.
arXiv Detail & Related papers (2025-06-09T03:48:50Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z)
OET: Optimization-based prompt injection Evaluation Toolkit [25.148709805243836]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation.<n>Their susceptibility to prompt injection attacks poses significant security risks.<n>Despite numerous defense strategies, a standardized framework to rigorously evaluate their effectiveness is lacking.
arXiv Detail & Related papers (2025-05-01T20:09:48Z)
TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation [31.231916859341865]
TrustRAG is a framework that systematically filters malicious and irrelevant content before it is retrieved for generation.<n>TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance.
arXiv Detail & Related papers (2025-01-01T15:57:34Z)
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations.<n>RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into the retrieval corpus can mislead models into producing factually incorrect outputs.<n>We present a rigorously controlled empirical study of how RAG systems behave under such attacks and how their robustness can be improved.
arXiv Detail & Related papers (2024-12-21T17:31:52Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.