Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation
- URL: http://arxiv.org/abs/2508.03110v1
- Date: Tue, 05 Aug 2025 05:44:19 GMT
- Title: Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation
- Authors: Zizhong Li, Haopeng Zhang, Jiawei Zhang,
- Abstract summary: Token-level Precise Attack on the RAG (TPARAG) is a novel framework that targets both white-box and black-box RAG systems.<n>TPARAG consistently outperforms previous approaches in retrieval-stage and end-to-end attack effectiveness.
- Score: 7.441679541836913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large language models (LLMs) have achieved remarkable success in providing trustworthy responses for knowledge-intensive tasks, they still face critical limitations such as hallucinations and outdated knowledge. To address these issues, the retrieval-augmented generation (RAG) framework enhances LLMs with access to external knowledge via a retriever, enabling more accurate and real-time outputs about the latest events. However, this integration brings new security vulnerabilities: the risk that malicious content in the external database can be retrieved and used to manipulate model outputs. Although prior work has explored attacks on RAG systems, existing approaches either rely heavily on access to the retriever or fail to jointly consider both retrieval and generation stages, limiting their effectiveness, particularly in black-box scenarios. To overcome these limitations, we propose Token-level Precise Attack on the RAG (TPARAG), a novel framework that targets both white-box and black-box RAG systems. TPARAG leverages a lightweight white-box LLM as an attacker to generate and iteratively optimize malicious passages at the token level, ensuring both retrievability and high attack success in generation. Extensive experiments on open-domain QA datasets demonstrate that TPARAG consistently outperforms previous approaches in retrieval-stage and end-to-end attack effectiveness. These results further reveal critical vulnerabilities in RAG pipelines and offer new insights into improving their robustness.
Related papers
- CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models [15.349703228157479]
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge.<n>Existing poisoning methods for RAG systems have limitations, such as poor generalization and lack of fluency in adversarial texts.<n>We propose CPA-RAG, a black-box adversarial framework that generates query-relevant texts capable of manipulating the retrieval process to induce target answers.
arXiv Detail & Related papers (2025-05-26T11:48:32Z) - The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z) - POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z) - ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models [55.93380086403591]
Generative large language models are vulnerable to backdoor attacks.<n>$textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning.<n>$textitELBA-Bench$ provides over 1300 experiments.
arXiv Detail & Related papers (2025-02-22T12:55:28Z) - SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model [17.046058202577985]
We introduce a benchmark named SafeRAG designed to evaluate the RAG security.<n>First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service.<n>We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter.
arXiv Detail & Related papers (2025-01-28T17:01:31Z) - FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models [22.35026334463735]
We propose FlippedRAG, a transfer-based adversarial attack against black-box RAG systems.<n>FlippedRAG achieves on average a 50% directional shift in the opinion of RAG-generated responses.<n>These results highlight an urgent need for developing innovative defensive solutions to ensure the security and trustworthiness of RAG systems.
arXiv Detail & Related papers (2025-01-06T12:24:57Z) - Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks [12.061098193438022]
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge.
This paper investigates the security of RAG systems against end-to-end indirect prompt manipulations.
arXiv Detail & Related papers (2024-08-09T12:26:05Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages.
We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z) - Improved Generation of Adversarial Examples Against Safety-aligned LLMs [72.38072942860309]
Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing automatic jailbreak attacks against safety-aligned LLMs.
In this paper, we explore a new perspective on this problem, suggesting that it can be alleviated by leveraging innovations inspired in transfer-based attacks.
We show that 87% of the query-specific adversarial suffixes generated by the developed combination can induce Llama-2-7B-Chat to produce the output that exactly matches the target string on AdvBench.
arXiv Detail & Related papers (2024-05-28T06:10:12Z) - RAGGED: Towards Informed Design of Scalable and Stable RAG Systems [51.171355532527365]
Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge.<n>RAGGED is a framework for systematically evaluating RAG systems.
arXiv Detail & Related papers (2024-03-14T02:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.