Related papers: The Hidden Threat in Plain Text: Attacking RAG Data Loaders

The Hidden Threat in Plain Text: Attacking RAG Data Loaders

URL: http://arxiv.org/abs/2507.05093v1
Date: Mon, 07 Jul 2025 15:13:54 GMT
Title: The Hidden Threat in Plain Text: Attacking RAG Data Loaders
Authors: Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi,
Abstract summary: Large Language Models (LLMs) have transformed human-machine interaction since ChatGPT's 2022 debut.<n>RAG's reliance on ingesting external documents introduces new vulnerabilities.<n>This paper exposes a critical security gap at the data loading stage, where malicious actors can stealthily corrupt RAG pipelines by exploiting document ingestion.
Score: 14.418668801863422
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) have transformed human-machine interaction since ChatGPT's 2022 debut, with Retrieval-Augmented Generation (RAG) emerging as a key framework that enhances LLM outputs by integrating external knowledge. However, RAG's reliance on ingesting external documents introduces new vulnerabilities. This paper exposes a critical security gap at the data loading stage, where malicious actors can stealthily corrupt RAG pipelines by exploiting document ingestion. We propose a taxonomy of 9 knowledge-based poisoning attacks and introduce two novel threat vectors -- Content Obfuscation and Content Injection -- targeting common formats (DOCX, HTML, PDF). Using an automated toolkit implementing 19 stealthy injection techniques, we test five popular data loaders, finding a 74.4% attack success rate across 357 scenarios. We further validate these threats on six end-to-end RAG systems -- including white-box pipelines and black-box services like NotebookLM and OpenAI Assistants -- demonstrating high success rates and critical vulnerabilities that bypass filters and silently compromise output integrity. Our results emphasize the urgent need to secure the document ingestion process in RAG systems against covert content manipulations.

Related papers

The Hidden Dangers of Browsing AI Agents [0.0]
This paper presents a comprehensive security evaluation of such agents, focusing on systemic vulnerabilities across multiple architectural layers.<n>Our work outlines the first end-to-end threat model for browsing agents and provides actionable guidance for securing their deployment in real-world environments.
arXiv Detail & Related papers (2025-05-19T13:10:29Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models [0.0]
Large Language Models (LLMs) continue to exhibit vulnerabilities to jailbreaking attacks.<n>We present AutoAdv, a novel framework that automates adversarial prompt generation.<n>We show that our attacks achieve jailbreak success rates of up to 86% for harmful content generation.
arXiv Detail & Related papers (2025-04-18T08:38:56Z)
The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems [8.347617177093056]
We introduce a realistic black-box attack based on the RAG paradox, a structural vulnerability arising from the system's effort to enhance trust.<n>Unlike prior work that focuses solely on improving document retrievability, our attack method explicitly considers both retrievability and user trust.<n>Our method significantly degrades system performance without internal access, while generating natural-looking poisoned documents.
arXiv Detail & Related papers (2025-02-28T12:32:53Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)<n>RAG enhances MLLMs by grounding responses in query-relevant external knowledge.<n>This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.<n>We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors [15.861833242429228]
We investigate data extraction attacks targeting RAG's knowledge databases.<n>We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs.<n>We propose to backdoor RAG, where a small portion of poisoned data is injected during the fine-tuning phase to create a backdoor within the LLM.
arXiv Detail & Related papers (2024-11-03T22:27:40Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation [30.63258739968483]
Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) We propose new attack vectors that allow an adversary to inject a single malicious document into a RAG system's knowledge base, and mount a backdoor poisoning attack. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama, and show that they transfer to GPT-3.5 Turbo and GPT-4.
arXiv Detail & Related papers (2024-05-30T21:19:24Z)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities.<n>Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.<n>We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.