Related papers: "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models

Related papers

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation [50.87199039334856]
Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications.<n>Recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries.<n>We introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems.
arXiv Detail & Related papers (2026-02-10T01:27:46Z)
Exploiting Web Search Tools of AI Agents for Data Exfiltration [0.46664938579243564]
Large language models (LLMs) are now routinely used to execute complex tasks, from natural language processing to dynamic like web searches.<n>The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse.<n>We analyze how susceptible current LLMs are to indirect prompt injection attacks, which parameters, including model size and manufacturer, shape their vulnerability, and which attack methods remain most effective.
arXiv Detail & Related papers (2025-10-10T07:39:01Z)
External Data Extraction Attacks against Retrieval-Augmented Large Language Models [70.47869786522782]
RAG has emerged as a key paradigm for enhancing large language models (LLMs)<n>RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim.<n>We present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs.
arXiv Detail & Related papers (2025-10-03T12:53:45Z)
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface [4.823988025629304]
Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses.<n>Existing research has demonstrated that RAGs can leak sensitive information through training data memorization or adversarial prompts, and RAG systems inherit many of these vulnerabilities.<n>Despite these risks, there is currently no formal framework that defines the threat landscape for RAG systems.
arXiv Detail & Related papers (2025-09-24T17:11:35Z)
Safety Devolution in AI Agents [56.482973617087254]
This study investigates how expanding retrieval access affects model reliability, bias propagation, and harmful content generation.<n>Retrieval-augmented agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval.<n>These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-augmented and increasingly autonomous AI systems.
arXiv Detail & Related papers (2025-05-20T11:21:40Z)
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models [4.620537391830117]
Large language models (LLMs) are susceptible to hallucinations, which can lead to incorrect or misleading outputs.<n>Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources.<n>In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites.
arXiv Detail & Related papers (2025-05-10T09:36:28Z)
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG) FedE4RAG facilitates collaborative training of client-side RAG retrieval models. We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
TrustRAG: Enhancing Robustness and Trustworthiness in RAG [31.231916859341865]
TrustRAG is a framework that systematically filters compromised and irrelevant contents before they are retrieved for generation. TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance compared to existing approaches.
arXiv Detail & Related papers (2025-01-01T15:57:34Z)
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations. RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into retrieval databases can mislead the model into generating factually incorrect outputs. This paper investigates both the retrieval and the generation components of RAG systems to understand how to enhance their robustness against such attacks.
arXiv Detail & Related papers (2024-12-21T17:31:52Z)
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks [18.576435409729655]
We propose an agent-based automated privacy attack called RAG-Thief. It can extract a scalable amount of private data from the private database used in RAG applications. Our findings highlight the privacy vulnerabilities in current RAG applications and underscore the pressing need for stronger safeguards.
arXiv Detail & Related papers (2024-11-21T13:18:03Z)
ShieldGemma: Generative AI Content Moderation Based on Gemma [49.91147965876678]
ShieldGemma is a suite of safety content moderation models built upon Gemma2. Models provide robust, state-of-the-art predictions of safety risks across key harm types.
arXiv Detail & Related papers (2024-07-31T17:48:14Z)
Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [21.01313168005792]
We reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making.
arXiv Detail & Related papers (2024-07-18T17:55:55Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation [0.9217021281095907]
We introduce an efficient and easy-to-use method for conducting a Membership Inference Attack (MIA) against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models. Our findings highlight the importance of implementing security countermeasures in deployed RAG systems.
arXiv Detail & Related papers (2024-05-30T19:46:36Z)
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation [47.42366169887162]
Credibility-aware Generation (CAG) aims to equip models with the ability to discern and process information based on its credibility. Our model can effectively understand and utilize credibility for generation, significantly outperform other models with retrieval augmentation, and exhibit resilience against the disruption caused by noisy documents.
arXiv Detail & Related papers (2024-04-10T07:56:26Z)
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data. In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z)
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents [49.30553350788524]
Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge. Existing RAG models often treat LLMs as passive recipients of information. We introduce ActiveRAG, a multi-agent framework that mimics human learning behavior.
arXiv Detail & Related papers (2024-02-21T06:04:53Z)
Whispers in the Machine: Confidentiality in LLM-integrated Systems [7.893457690926516]
Large Language Models (LLMs) are increasingly augmented with external tools and commercial services into LLM-integrated systems. Manipulated integrations can exploit the model and compromise sensitive data accessed through other interfaces. We introduce a systematic approach to evaluate confidentiality risks in LLM-integrated systems.
arXiv Detail & Related papers (2024-02-10T11:07:24Z)
Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server. Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples. We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.