Related papers: Practical Poisoning Attacks against Retrieval-Augmented Generation

Related papers

Confundo: Learning to Generate Robust Poison for Practical RAG Systems [19.77771071590713]
Confundo is a learning-to-poison framework that fine-tunes a large language model as a poison generator to achieve high effectiveness, robustness, and stealthiness.<n>We show how Confundo consistently outperforms a wide range of purpose-built attacks across datasets and RAG configurations.<n>We also present a defensive use case that protects web content from unauthorized incorporation into RAG systems via scraping.
arXiv Detail & Related papers (2026-02-06T11:19:49Z)
Secure Retrieval-Augmented Generation against Poisoning Attacks [10.964269668142151]
Large language models (LLMs) have transformed natural language processing (NLP)<n>RaGuard is a detection framework designed to identify poisoned texts.<n> experiments on large-scale datasets demonstrate its effectiveness in detecting and mitigating poisoning attacks.
arXiv Detail & Related papers (2025-10-28T22:54:19Z)
TopicAttack: An Indirect Prompt Injection Attack via Topic Transition [71.81906608221038]
Large language models (LLMs) are vulnerable to indirect prompt injection attacks.<n>We propose TopicAttack, which prompts the LLM to generate a fabricated transition prompt that gradually shifts the topic toward the injected instruction.<n>We find that a higher injected-to-original attention ratio leads to a greater success probability, and our method achieves a much higher ratio than the baseline methods.
arXiv Detail & Related papers (2025-07-18T06:23:31Z)
The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z)
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs [54.90315421117162]
We propose a novel poisoning method via completely harmless data.<n>Inspired by the causal reasoning in auto-regressive LLMs, we aim to establish robust associations between triggers and an affirmative response prefix.<n>We observe an interesting resistance phenomenon where the LLM initially appears to agree but subsequently refuses to answer.
arXiv Detail & Related papers (2025-05-23T08:13:59Z)
One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems [19.179465547413848]
Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have shown improved performance in generating accurate responses.<n> dependence on external knowledge bases introduces potential security vulnerabilities.<n>This paper reveals a more realistic knowledge poisoning attack against RAG systems that achieves successful attacks by poisoning only a single document.
arXiv Detail & Related papers (2025-05-15T08:14:58Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs [8.09404178079053]
Retrieval-Augmented Generation (RAG) integrates Large Language Models (LLMs) with external knowledge bases, improving output quality while introducing new security risks. Existing studies on RAG vulnerabilities typically focus on exploiting the retrieval mechanism to inject erroneous knowledge or malicious texts, inducing incorrect outputs. In this paper, we uncover a novel vulnerability: the safety guardrails of LLMs, while designed for protection, can also be exploited as an attack vector by adversaries. Building on this vulnerability, we propose MutedRAG, a novel denial-of-service attack that reversely leverages the guardrails to undermine the availability of
arXiv Detail & Related papers (2025-04-30T14:18:11Z)
Traceback of Poisoning Attacks to Retrieval-Augmented Generation [10.19539347377776]
Research has revealed RAG's susceptibility to poisoning attacks, where the attacker injects poisoned texts into the knowledge database. Existing defenses, which predominantly focus on inference-time mitigation, have proven insufficient against sophisticated attacks. We introduce RAGForensics, the first traceback system for RAG, designed to identify poisoned texts within the knowledge database that are responsible for the attacks.
arXiv Detail & Related papers (2025-04-30T14:10:02Z)
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization [13.751251342738225]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. They also exhibit inherent limitations, such as outdated knowledge and susceptibility to hallucinations. Recent efforts have focused on the security of RAG-based LLMs, yet existing attack methods face three critical challenges. We propose coordinated Prompt-RAG attack (PR-attack), a novel optimization-driven attack that introduces a small number of poisoned texts into the knowledge database.
arXiv Detail & Related papers (2025-04-10T13:09:50Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs)<n>In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)<n>RAG enhances MLLMs by grounding responses in query-relevant external knowledge.<n>This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis [3.706288937295861]
RevPRAG is a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection.<n>Our results on multiple benchmark datasets and RAG architectures show our approach could achieve 98% true positive rate, while maintaining false positive rates close to 1%.
arXiv Detail & Related papers (2024-11-28T06:29:46Z)
Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models [0.0]
Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval. This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation. We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
arXiv Detail & Related papers (2024-10-18T14:02:34Z)
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning. On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models [45.409248316497674]
Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG.
arXiv Detail & Related papers (2024-02-12T18:28:36Z)
Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder [78.01180944665089]
This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. We present a 'backdoor poisoning' attack on NLP models.
arXiv Detail & Related papers (2020-10-06T13:03:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.