Related papers: SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache

SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache

URL: http://arxiv.org/abs/2505.10951v2
Date: Mon, 19 May 2025 17:51:26 GMT
Title: SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache
Authors: Qiuyu Zhu, Liang Zhang, Qianxiong Xu, Cheng Long, Jie Zhang,
Abstract summary: SubGCache aims to reduce inference latency by reusing computation across queries with similar structural prompts.<n>Experiments on two new datasets demonstrate that SubGCache consistently reduces inference latency with comparable and even improved generation quality.
Score: 20.26177496265456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph-based retrieval-augmented generation (RAG) enables large language models (LLMs) to incorporate structured knowledge via graph retrieval as contextual input, enhancing more accurate and context-aware reasoning. We observe that for different queries, it could retrieve similar subgraphs as prompts, and thus we propose SubGCache, which aims to reduce inference latency by reusing computation across queries with similar structural prompts (i.e., subgraphs). Specifically, SubGCache clusters queries based on subgraph embeddings, constructs a representative subgraph for each cluster, and pre-computes the key-value (KV) cache of the representative subgraph. For each query with its retrieved subgraph within a cluster, it reuses the pre-computed KV cache of the representative subgraph of the cluster without computing the KV tensors again for saving computation. Experiments on two new datasets across multiple LLM backbones and graph-based RAG frameworks demonstrate that SubGCache consistently reduces inference latency with comparable and even improved generation quality, achieving up to 6.68$\times$ reduction in time-to-first-token (TTFT).

Related papers

Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [18.96570718233786]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z)
CLEAR: Cluster-based Prompt Learning on Heterogeneous Graphs [19.956925820094177]
We present CLEAR, a Cluster-based prompt model on heterogeneous graphs.<n>We align the pretext and downstream tasks to share the same training objective.<n>Experiments on downstream tasks confirm the superiority of CLEAR.
arXiv Detail & Related papers (2025-02-13T03:10:19Z)
Beyond Message Passing: Neural Graph Pattern Machine [50.78679002846741]
We introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures.<n>GPM efficiently extracts, encodes, and prioritizes task-relevant graph patterns, offering greater expressivity and improved ability to capture long-range dependencies.
arXiv Detail & Related papers (2025-01-30T20:37:47Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC) GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs. Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z)
Towards Writer Retrieval for Historical Datasets [0.6445605125467572]
unsupervised approach for writer retrieval based on clustering SIFT descriptors detected at keypoint locations. residual network followed by our proposed NetRVLAD, an encoding layer with reduced complexity. We show that our approach achieves comparable performance on a modern dataset as well.
arXiv Detail & Related papers (2023-05-09T11:44:44Z)
EGRC-Net: Embedding-induced Graph Refinement Clustering Network [66.44293190793294]
We propose a novel graph clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net) EGRC-Net effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance. Our proposed methods consistently outperform several state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-19T09:08:43Z)
Dual Contrastive Attributed Graph Clustering Network [6.796682703663566]
We propose a generic framework called Dual Contrastive Attributed Graph Clustering Network (DCAGC) In DCAGC, by leveraging Neighborhood Contrast Module, the similarity of the neighbor nodes will be maximized and the quality of the node representation will be improved. All the modules of DCAGC are trained and optimized in a unified framework, so the learned node representation contains clustering-oriented messages.
arXiv Detail & Related papers (2022-06-16T03:17:01Z)
Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms. The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z)
Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G. matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z)
Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph [34.774049199809426]
Knowledge graph question answering (i.e., KGQA) based on information retrieval aims to answer a question by retrieving answer from a large-scale knowledge graph. We first propose to partition the retrieved KSG to several smaller sub-KSGs via a new subgraph partition algorithm. We then present a graph-augmented learning to rank model to select the top-ranked sub-KSGs from them.
arXiv Detail & Related papers (2021-11-20T08:27:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.