Exposing Privacy Risks in Graph Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2508.17222v1
- Date: Sun, 24 Aug 2025 06:19:44 GMT
- Title: Exposing Privacy Risks in Graph Retrieval-Augmented Generation
- Authors: Jiale Liu, Jiahao Zhang, Suhang Wang,
- Abstract summary: Graph RAG is a powerful technique for enhancing Large Language Models (LLMs) with external, up-to-date knowledge.<n>This paper investigates the data extraction vulnerabilities of the Graph RAG systems.<n>Our findings reveal a critical trade-off: while Graph RAG systems may reduce raw text leakage, they are significantly more vulnerable to the extraction of structured entity and relationship information.
- Score: 32.961000274379835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) is a powerful technique for enhancing Large Language Models (LLMs) with external, up-to-date knowledge. Graph RAG has emerged as an advanced paradigm that leverages graph-based knowledge structures to provide more coherent and contextually rich answers. However, the move from plain document retrieval to structured graph traversal introduces new, under-explored privacy risks. This paper investigates the data extraction vulnerabilities of the Graph RAG systems. We design and execute tailored data extraction attacks to probe their susceptibility to leaking both raw text and structured data, such as entities and their relationships. Our findings reveal a critical trade-off: while Graph RAG systems may reduce raw text leakage, they are significantly more vulnerable to the extraction of structured entity and relationship information. We also explore potential defense mechanisms to mitigate these novel attack surfaces. This work provides a foundational analysis of the unique privacy challenges in Graph RAG and offers insights for building more secure systems.
Related papers
- Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses [14.432465518281804]
We show that adversaries can reconstruct subgraphs from a target RAG system's knowledge graph, enabling privacy inference and replication of curated knowledge assets.<n>Existing attacks are largely ineffective against Graph RAG even with simple prompt-based safeguards.<n>We introduce GRASP, a closed-box, multi-turn subgraph reconstruction attack.
arXiv Detail & Related papers (2026-02-06T08:44:22Z) - Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems [29.89127594311822]
Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning.<n>We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph.<n>We propose AGEA, a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline.
arXiv Detail & Related papers (2026-01-21T05:20:54Z) - TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks [3.3238054848751535]
We introduce TrustGLM, a comprehensive study evaluating the vulnerability of GraphLLMs to adversarial attacks across three dimensions: text, graph structure, and prompt manipulations.<n>Our findings reveal that GraphLLMs are highly susceptible to text attacks that merely replace a few semantically similar words in a node's textual attribute.<n>We also find that standard graph structure attack methods can significantly degrade model performance, while random shuffling of the candidate label set in prompt templates leads to substantial performance drops.
arXiv Detail & Related papers (2025-06-13T14:48:01Z) - RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline.<n>RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components.<n>Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z) - GraphRAG under Fire [13.69098945498758]
This work examines GraphRAG's vulnerability to poisoning attacks, uncovering an intriguing security paradox.<n>Existing RAG poisoning attacks are less effective under GraphRAG than conventional RAG, due to GraphRAG's graph-based indexing and retrieval.<n>We present GragPoison, a novel attack that exploits shared relations in the underlying knowledge graph to craft poisoning text.
arXiv Detail & Related papers (2025-01-23T19:33:16Z) - Retrieval-Augmented Generation with Graphs (GraphRAG) [84.29507404866257]
Retrieval-augmented generation (RAG) is a powerful technique that enhances downstream task execution by retrieving additional information.<n>Graph, by its intrinsic "nodes connected by edges" nature, encodes massive heterogeneous and relational information.<n>Unlike conventional RAG, the uniqueness of graph-structured data, such as diverse-formatted and domain-specific relational knowledge, poses unique and significant challenges when designing GraphRAG for different domains.
arXiv Detail & Related papers (2024-12-31T06:59:35Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - Self-Guided Robust Graph Structure Refinement [37.235898707554284]
We propose a self-guided graph structure refinement (GSR) framework to defend GNNs against adversarial attacks.
In this paper, we demonstrate the effectiveness of SG-GSR under various scenarios including non-targeted attacks, targeted attacks, feature attacks, e-commerce fraud, and noisy node labels.
arXiv Detail & Related papers (2024-02-19T05:00:07Z) - On provable privacy vulnerabilities of graph representations [34.45433384694758]
Graph representation learning (GRL) is critical for extracting insights from complex network structures.
It also raises security concerns due to potential privacy vulnerabilities in these representations.
This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks.
arXiv Detail & Related papers (2024-02-06T14:26:22Z) - GraphMAE: Self-Supervised Masked Graph Autoencoders [52.06140191214428]
We present a masked graph autoencoder GraphMAE that mitigates issues for generative self-supervised graph learning.
We conduct extensive experiments on 21 public datasets for three different graph learning tasks.
The results manifest that GraphMAE--a simple graph autoencoder with our careful designs--can consistently generate outperformance over both contrastive and generative state-of-the-art baselines.
arXiv Detail & Related papers (2022-05-22T11:57:08Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.