Related papers: WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

URL: http://arxiv.org/abs/2602.02053v2
Date: Tue, 03 Feb 2026 06:46:26 GMT
Title: WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora
Authors: Pengyu Wang, Benfeng Xu, Licheng Zhang, Shaohan Wang, Mingxuan Du, Chiwei Zhu, Zhendong Mao,
Abstract summary: Graph-based Retrieval-Augmented Generation (GraphRAG) organizes external knowledge as a hierarchical graph.<n>Many existing benchmarks for GraphRAG rely on short, curated passages as external knowledge.<n>We introduce WildGraphBench, a benchmark designed to assess GraphRAG performance in the wild.
Score: 34.720109050809285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) organizes external knowledge as a hierarchical graph, enabling efficient retrieval and aggregation of scattered evidence across multiple documents. However, many existing benchmarks for GraphRAG rely on short, curated passages as external knowledge, failing to adequately evaluate systems in realistic settings involving long contexts and large-scale heterogeneous documents. To bridge this gap, we introduce WildGraphBench, a benchmark designed to assess GraphRAG performance in the wild. We leverage Wikipedia's unique structure, where cohesive narratives are grounded in long and heterogeneous external reference documents, to construct a benchmark reflecting real-word scenarios. Specifically, we sample articles across 12 top-level topics, using their external references as the retrieval corpus and citation-linked statements as ground truth, resulting in 1,100 questions spanning three levels of complexity: single-fact QA, multi-fact QA, and section-level summarization. Experiments across multiple baselines reveal that current GraphRAG pipelines help on multi-fact aggregation when evidence comes from a moderate number of sources, but this aggregation paradigm may overemphasize high-level statements at the expense of fine-grained details, leading to weaker performance on summarization tasks. Project page:https://github.com/BstWPY/WildGraphBench.

Related papers

Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation [53.42323544075114]
We propose GraphAnchor, a novel Graph-Anchored Knowledge Indexing approach.<n> Experiments on four multi-hop question answering benchmarks demonstrate the effectiveness of GraphAnchor.
arXiv Detail & Related papers (2026-01-23T05:41:05Z)
Multi-Agent GraphRAG: A Text-to-Cypher Framework for Labeled Property Graphs [7.943264761730892]
Multi-Agent GraphRAG serves as a natural language interface to LPG-based graph data.<n>Iterative content-aware correction and normalization, reinforced by an aggregated feedback loop, ensures both semantic and syntactic refinement of generated queries.<n>This highlights how such an approach can bridge AI with real-world applications at scale, enabling industrial digital automation use cases.
arXiv Detail & Related papers (2025-11-11T14:04:00Z)
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora [17.929144506419064]
Retrieval-Augmented Generation (RAG) is widely used to mitigate hallucinations of Large Language Models (LLMs) by leveraging external knowledge.<n>Existing graph-based RAG methods rely on unstable and costly relation extraction for graph construction.<n>We propose LinearRAG, an efficient framework that enables reliable graph construction and precise passage retrieval.
arXiv Detail & Related papers (2025-10-11T08:43:45Z)
Hierarchical Lexical Graph for Enhanced Multi-Hop Retrieval [22.33550491040999]
RAG grounds large language models in external evidence, yet it still falters when answers must be pieced together across semantically distant documents.<n>We build two plug-and-play retrievers: StatementGraphRAG and TopicGraphRAG.<n>Our methods outperform naive chunk-based RAG achieving an average relative improvement of 23.1% in retrieval recall and correctness.
arXiv Detail & Related papers (2025-06-09T17:58:35Z)
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation [31.930889441883732]
Graph retrieval-augmented generation (GraphRAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) with external knowledge.<n>Recent studies report that GraphRAG frequently underperforms vanilla RAG on many real-world tasks.<n>This raises a critical question: Is GraphRAG really effective, and in which scenarios do graph structures provide measurable benefits for RAG systems?
arXiv Detail & Related papers (2025-06-06T02:37:47Z)
E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness [15.829377965705746]
We propose E2GraphRAG, a streamlined graph-based RAG framework.<n>E2GraphRAG achieves up to 10 times faster indexing than GraphRAG and 100 times speedup over LightRAG in retrieval.
arXiv Detail & Related papers (2025-05-30T05:27:40Z)
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [79.75818239774952]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z)
ZOGRASCOPE: A New Benchmark for Semantic Parsing over Property Graphs [3.0748861313823]
Property graphs (PGs) have seen increased adoption as a means of representing complex structured information.<n>Despite their growing popularity in industry, PGs remain relatively underrepresented in semantic parsing research.<n>We introduce ZOGRASCOPE, a benchmark designed specifically for PGs and queries written in Cypher.
arXiv Detail & Related papers (2025-03-07T09:33:30Z)
RAG vs. GraphRAG: A Systematic Evaluation and Key Insights [53.83444096699458]
We systematically evaluate Retrieval-Augmented Generation (RAG) and GraphRAG on text-based benchmarks.<n>Our results highlight the distinct strengths of RAG and GraphRAG across different tasks and evaluation perspectives.
arXiv Detail & Related papers (2025-02-17T02:36:30Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document. Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability. We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.