Pruning Minimal Reasoning Graphs for Efficient Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2602.04926v1
- Date: Wed, 04 Feb 2026 08:48:11 GMT
- Title: Pruning Minimal Reasoning Graphs for Efficient Retrieval-Augmented Generation
- Authors: Ning Wang, Kuanyan Zhu, Daniel Yuehwoon Yee, Yitang Gao, Shiying Huang, Zirun Xu, Sainyam Galhotra,
- Abstract summary: We present AutoPrunedRetriever, a graph-style RAG system that persists the minimal reasoning subgraph built for earlier questions and incrementally extends it for later ones.<n>AutoPrunedRetriever stores entities and relations in a compact, ID-indexed codebook and represents questions, facts, and answers as edge sequences.<n>On our harder and TV benchmarks, AutoPrunedRetriever again ranks first, while using up two orders of magnitude fewer tokens than graph-heavy baselines.
- Score: 7.922207753581737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation (RAG) is now standard for knowledge-intensive LLM tasks, but most systems still treat every query as fresh, repeatedly re-retrieving long passages and re-reasoning from scratch, inflating tokens, latency, and cost. We present AutoPrunedRetriever, a graph-style RAG system that persists the minimal reasoning subgraph built for earlier questions and incrementally extends it for later ones. AutoPrunedRetriever stores entities and relations in a compact, ID-indexed codebook and represents questions, facts, and answers as edge sequences, enabling retrieval and prompting over symbolic structure instead of raw text. To keep the graph compact, we apply a two-layer consolidation policy (fast ANN/KNN alias detection plus selective $k$-means once a memory threshold is reached) and prune low-value structure, while prompts retain only overlap representatives and genuinely new evidence. We instantiate two front ends: AutoPrunedRetriever-REBEL, which uses REBEL as a triplet parser, and AutoPrunedRetriever-llm, which swaps in an LLM extractor. On GraphRAG-Benchmark (Medical and Novel), both variants achieve state-of-the-art complex reasoning accuracy, improving over HippoRAG2 by roughly 9--11 points, and remain competitive on contextual summarize and generation. On our harder STEM and TV benchmarks, AutoPrunedRetriever again ranks first, while using up to two orders of magnitude fewer tokens than graph-heavy baselines, making it a practical substrate for long-running sessions, evolving corpora, and multi-agent pipelines.
Related papers
- TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries [53.99620546358492]
Real-world use cases often present RAG systems with complex queries for which relevant information is missing from the corpus or is incomplete.<n>Existing RAG benchmarks rarely reflect realistic task complexity for multi-hop or out-of-scope questions.<n>We present the first pipeline for automatic, difficulty-controlled creation of un$underlinec$heatable, $underliner$ealistic, $underlineu$nanswerable, and $underlinem$ulti-hop.
arXiv Detail & Related papers (2025-10-13T21:38:04Z) - EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora [20.890240791042302]
Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus.<n>We introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates.<n>Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures.
arXiv Detail & Related papers (2025-06-26T03:01:33Z) - Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation [69.45495166424642]
We develop a robust and discriminative QA benchmark to measure temporal, causal, and character consistency understanding in narrative documents.<n>We then introduce Entity-Event RAG (E2RAG), a dual-graph framework that keeps separate entity and event subgraphs linked by a bipartite mapping.<n>Across ChronoQA, our approach outperforms state-of-the-art unstructured and KG-based RAG baselines, with notable gains on causal and character consistency queries.
arXiv Detail & Related papers (2025-06-06T10:07:21Z) - Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models [38.17736879002141]
We compare two recent multi-stage pipelines, ReadAgent and RAPTOR, against three baselines, including DOS RAG (Document's Original Structure RAG)<n> DOS RAG consistently matches or outperforms more intricate methods on multiple long-context QA benchmarks.<n>We recommend establishing DOS RAG as a simple yet strong baseline for future RAG evaluations.
arXiv Detail & Related papers (2025-06-04T14:16:28Z) - Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [79.75818239774952]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache [20.26177496265456]
SubGCache aims to reduce inference latency by reusing computation across queries with similar structural prompts.<n>Experiments on two new datasets demonstrate that SubGCache consistently reduces inference latency with comparable and even improved generation quality.
arXiv Detail & Related papers (2025-05-16T07:39:41Z) - MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models [22.50450558103786]
In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts.<n>We propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results.<n> Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.
arXiv Detail & Related papers (2024-08-30T07:57:30Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.