Related papers: CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models

CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models

URL: http://arxiv.org/abs/2505.19864v1
Date: Mon, 26 May 2025 11:48:32 GMT
Title: CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models
Authors: Chunyang Li, Junwei Zhang, Anda Cheng, Zhuo Ma, Xinghua Li, Jianfeng Ma,
Abstract summary: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge.<n>Existing poisoning methods for RAG systems have limitations, such as poor generalization and lack of fluency in adversarial texts.<n>We propose CPA-RAG, a black-box adversarial framework that generates query-relevant texts capable of manipulating the retrieval process to induce target answers.
Score: 15.349703228157479
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, but its openness introduces vulnerabilities that can be exploited by poisoning attacks. Existing poisoning methods for RAG systems have limitations, such as poor generalization and lack of fluency in adversarial texts. In this paper, we propose CPA-RAG, a black-box adversarial framework that generates query-relevant texts capable of manipulating the retrieval process to induce target answers. The proposed method integrates prompt-based text generation, cross-guided optimization through multiple LLMs, and retriever-based scoring to construct high-quality adversarial samples. We conduct extensive experiments across multiple datasets and LLMs to evaluate its effectiveness. Results show that the framework achieves over 90\% attack success when the top-k retrieval setting is 5, matching white-box performance, and maintains a consistent advantage of approximately 5 percentage points across different top-k values. It also outperforms existing black-box baselines by 14.5 percentage points under various defense strategies. Furthermore, our method successfully compromises a commercial RAG system deployed on Alibaba's BaiLian platform, demonstrating its practical threat in real-world applications. These findings underscore the need for more robust and secure RAG frameworks to defend against poisoning attacks.

Related papers

Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation [7.441679541836913]
Token-level Precise Attack on the RAG (TPARAG) is a novel framework that targets both white-box and black-box RAG systems.<n>TPARAG consistently outperforms previous approaches in retrieval-stage and end-to-end attack effectiveness.
arXiv Detail & Related papers (2025-08-05T05:44:19Z)
DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection [0.9499594220629591]
Adrial prompt attacks can significantly alter the reliability of Retrieval-Augmented Generation (RAG) systems.<n>We present a novel method that applies Differential Evolution (DE) to optimize adversarial prompt suffixes for RAG-based question answering.
arXiv Detail & Related papers (2025-07-20T16:48:20Z)
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation [12.573766276297441]
Retrieval-Augmented Generation (RAG) has proven effective in mitigating hallucinations in large language models by incorporating external knowledge during inference.<n>We propose the first comprehensive benchmark framework for evaluating poisoning attacks on RAG.
arXiv Detail & Related papers (2025-05-24T06:17:59Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation [71.32665836294103]
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs)<n>In this work, we introduce textitPoisoned-MRAG, the first knowledge poisoning attack on multimodal RAG systems.
arXiv Detail & Related papers (2025-03-08T15:46:38Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)<n>RAG enhances MLLMs by grounding responses in query-relevant external knowledge.<n>This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models [55.93380086403591]
Generative large language models are vulnerable to backdoor attacks.<n>$textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning.<n>$textitELBA-Bench$ provides over 1300 experiments.
arXiv Detail & Related papers (2025-02-22T12:55:28Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models [22.35026334463735]
We propose FlippedRAG, a transfer-based adversarial attack against black-box RAG systems.<n>FlippedRAG achieves on average a 50% directional shift in the opinion of RAG-generated responses.<n>These results highlight an urgent need for developing innovative defensive solutions to ensure the security and trustworthiness of RAG systems.
arXiv Detail & Related papers (2025-01-06T12:24:57Z)
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks [12.061098193438022]
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This paper investigates the security of RAG systems against end-to-end indirect prompt manipulations.
arXiv Detail & Related papers (2024-08-09T12:26:05Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages. We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z)
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models [45.409248316497674]
Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG.
arXiv Detail & Related papers (2024-02-12T18:28:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.