SAGE: Structure Aware Graph Expansion for Retrieval of Heterogeneous Data
- URL: http://arxiv.org/abs/2602.16964v1
- Date: Wed, 18 Feb 2026 23:57:19 GMT
- Title: SAGE: Structure Aware Graph Expansion for Retrieval of Heterogeneous Data
- Authors: Prasham Titiya, Rohit Khoja, Tomer Wolfson, Vivek Gupta, Dan Roth,
- Abstract summary: Retrieval-augmented question answering over heterogeneous corpora requires connected evidence across text, tables, and graph nodes.<n>Standard retriever-reader pipelines use flat similarity search over independently chunked text, missing multi-hop evidence chains across modalities.<n>We propose SAGE (Structure Aware Graph Expansion) framework that constructs a chunk-level graph offline using metadata-driven similarities with percentile-based pruning.<n>We instantiate the initial retriever using hybrid dense+sparse retrieval for implicit cross-modal corpora and SPARK (Structure Aware Planning Agent for Retrieval over Knowledge Graphs) an agentic retriever for explicit schema graphs.
- Score: 47.930782177987446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented question answering over heterogeneous corpora requires connected evidence across text, tables, and graph nodes. While entity-level knowledge graphs support structured access, they are costly to construct and maintain, and inefficient to traverse at query time. In contrast, standard retriever-reader pipelines use flat similarity search over independently chunked text, missing multi-hop evidence chains across modalities. We propose SAGE (Structure Aware Graph Expansion) framework that (i) constructs a chunk-level graph offline using metadata-driven similarities with percentile-based pruning, and (ii) performs online retrieval by running an initial baseline retriever to obtain k seed chunks, expanding first-hop neighbors, and then filtering the neighbors using dense+sparse retrieval, selecting k' additional chunks. We instantiate the initial retriever using hybrid dense+sparse retrieval for implicit cross-modal corpora and SPARK (Structure Aware Planning Agent for Retrieval over Knowledge Graphs) an agentic retriever for explicit schema graphs. On OTT-QA and STaRK, SAGE improves retrieval recall by 5.7 and 8.5 points over baselines.
Related papers
- Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation [53.42323544075114]
We propose GraphAnchor, a novel Graph-Anchored Knowledge Indexing approach.<n> Experiments on four multi-hop question answering benchmarks demonstrate the effectiveness of GraphAnchor.
arXiv Detail & Related papers (2026-01-23T05:41:05Z) - N2N-GQA: Noise-to-Narrative for Graph-Based Table-Text Question Answering Using LLMs [0.0]
Multi-hop question answering over hybrid table-text data requires retrieving and reasoning across multiple evidence pieces from large corpora.<n>Standard Retrieval-Augmented Generation (RAG) pipelines process documents as flat ranked lists, causing retrieval noise to obscure reasoning chains.<n>N2N-GQA is the first zeroshot framework for open-domain hybrid table-text QA that constructs dynamic evidence graphs from noisy retrieval outputs.
arXiv Detail & Related papers (2026-01-10T15:55:15Z) - TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation [35.65907480060404]
textscGraphSearch is a novel agentic deep searching workflow with dual-channel retrieval for GraphRAG.<n>textscGraphSearch consistently improves answer accuracy and generation quality over the traditional strategy.
arXiv Detail & Related papers (2025-09-26T07:45:56Z) - Query-Aware Graph Neural Networks for Enhanced Retrieval-Augmented Generation [0.0]
We present a novel graph neural network architecture for retrieval-augmented generation (RAG)<n>Our approach constructs per-episode knowledge graphs that capture both sequential and semantic relationships between text chunks.<n>We introduce an Enhanced Graph Attention Network with query-guided pooling that dynamically focuses on relevant parts of the graph based on user queries.
arXiv Detail & Related papers (2025-07-25T19:42:27Z) - Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval [68.71038700559195]
Chain of Retrieval(COR) is a novel iterative framework for full-paper retrieval.<n>We present SCIBENCH, a benchmark providing both complete and segmented contexts of full papers for queries and candidates.
arXiv Detail & Related papers (2025-07-14T08:41:53Z) - SlimRAG: Retrieval without Graphs via Entity-Aware Context Selection [38.200971604630524]
SlimRAG is a lightweight framework for retrieval without graphs.<n>It replaces structure-heavy components with a simple yet effective entity-aware mechanism.<n> Experiments show that SlimRAG outperforms strong flat and graph-based baselines in accuracy.
arXiv Detail & Related papers (2025-06-15T15:36:17Z) - Hierarchical Lexical Graph for Enhanced Multi-Hop Retrieval [22.33550491040999]
RAG grounds large language models in external evidence, yet it still falters when answers must be pieced together across semantically distant documents.<n>We build two plug-and-play retrievers: StatementGraphRAG and TopicGraphRAG.<n>Our methods outperform naive chunk-based RAG achieving an average relative improvement of 23.1% in retrieval recall and correctness.
arXiv Detail & Related papers (2025-06-09T17:58:35Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.