"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
- URL: http://arxiv.org/abs/2406.19417v1
- Date: Wed, 26 Jun 2024 05:36:23 GMT
- Title: "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
- Authors: Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, Huan Liu,
- Abstract summary: Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
- Score: 74.05368440735468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving their performance in applications like fact-checking and information searching. In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases by injecting deceptive content into the retrieval database, intentionally changing the model's behavior. This threat is critical as it mirrors real-world usage scenarios where RAG systems interact with publicly accessible knowledge bases, such as web scrapings and user-contributed data pools. To be more realistic, we target a realistic setting where the adversary has no knowledge of users' queries, knowledge base data, and the LLM parameters. We demonstrate that it is possible to exploit the model successfully through crafted content uploads with access to the retriever. Our findings emphasize an urgent need for security measures in the design and deployment of RAG systems to prevent potential manipulation and ensure the integrity of machine-generated content.
Related papers
- RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks [18.576435409729655]
We propose an agent-based automated privacy attack called RAG-Thief.
It can extract a scalable amount of private data from the private database used in RAG applications.
Our findings highlight the privacy vulnerabilities in current RAG applications and underscore the pressing need for stronger safeguards.
arXiv Detail & Related papers (2024-11-21T13:18:03Z) - ShieldGemma: Generative AI Content Moderation Based on Gemma [49.91147965876678]
ShieldGemma is a suite of safety content moderation models built upon Gemma2.
Models provide robust, state-of-the-art predictions of safety risks across key harm types.
arXiv Detail & Related papers (2024-07-31T17:48:14Z) - Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [21.01313168005792]
We reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation.
We explore the impact of such attacks on user cognition and decision-making.
arXiv Detail & Related papers (2024-07-18T17:55:55Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation [0.9217021281095907]
We introduce an efficient and easy-to-use method for conducting a Membership Inference Attack (MIA) against RAG systems.
We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models.
Our findings highlight the importance of implementing security countermeasures in deployed RAG systems.
arXiv Detail & Related papers (2024-05-30T19:46:36Z) - Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation [47.42366169887162]
Credibility-aware Generation (CAG) aims to equip models with the ability to discern and process information based on its credibility.
Our model can effectively understand and utilize credibility for generation, significantly outperform other models with retrieval augmentation, and exhibit resilience against the disruption caused by noisy documents.
arXiv Detail & Related papers (2024-04-10T07:56:26Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents [49.30553350788524]
Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge.
Existing RAG models often treat LLMs as passive recipients of information.
We introduce ActiveRAG, a multi-agent framework that mimics human learning behavior.
arXiv Detail & Related papers (2024-02-21T06:04:53Z) - Whispers in the Machine: Confidentiality in LLM-integrated Systems [7.893457690926516]
Large Language Models (LLMs) are increasingly augmented with external tools and commercial services into LLM-integrated systems.
Manipulated integrations can exploit the model and compromise sensitive data accessed through other interfaces.
We introduce a systematic approach to evaluate confidentiality risks in LLM-integrated systems.
arXiv Detail & Related papers (2024-02-10T11:07:24Z) - Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.