RAKG:Document-level Retrieval Augmented Knowledge Graph Construction
- URL: http://arxiv.org/abs/2504.09823v1
- Date: Mon, 14 Apr 2025 02:47:23 GMT
- Title: RAKG:Document-level Retrieval Augmented Knowledge Graph Construction
- Authors: Hairong Zhang, Jiaheng Si, Guohang Yan, Boyuan Qi, Pinlong Cai, Song Mao, Ding Wang, Botian Shi,
- Abstract summary: This paper focuses on the task of automatic document-level knowledge graph construction.<n>It proposes the Document-level Retrieval Augmented Knowledge Graph Construction (RAKG) framework.
- Score: 10.013667560362565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of knowledge graph based retrieval-augmented generation (RAG) techniques such as GraphRAG and Pike-RAG, the role of knowledge graphs in enhancing the reasoning capabilities of large language models (LLMs) has become increasingly prominent. However, traditional Knowledge Graph Construction (KGC) methods face challenges like complex entity disambiguation, rigid schema definition, and insufficient cross-document knowledge integration. This paper focuses on the task of automatic document-level knowledge graph construction. It proposes the Document-level Retrieval Augmented Knowledge Graph Construction (RAKG) framework. RAKG extracts pre-entities from text chunks and utilizes these pre-entities as queries for RAG, effectively addressing the issue of long-context forgetting in LLMs and reducing the complexity of Coreference Resolution. In contrast to conventional KGC methods, RAKG more effectively captures global information and the interconnections among disparate nodes, thereby enhancing the overall performance of the model. Additionally, we transfer the RAG evaluation framework to the KGC field and filter and evaluate the generated knowledge graphs, thereby avoiding incorrectly generated entities and relationships caused by hallucinations in LLMs. We further developed the MINE dataset by constructing standard knowledge graphs for each article and experimentally validated the performance of RAKG. The results show that RAKG achieves an accuracy of 95.91 % on the MINE dataset, a 6.2 % point improvement over the current best baseline, GraphRAG (89.71 %). The code is available at https://github.com/LMMApplication/RAKG.
Related papers
- NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes [25.173078967881803]
Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus.
Current graph-based RAG approaches seldom prioritize the design of graph structures.
Inadequately designed graph not only impede the seamless integration of diverse graph algorithms but also result in workflow inconsistencies.
We propose NodeRAG, a graph-centric framework introducing heterogeneous graph structures.
arXiv Detail & Related papers (2025-04-15T18:24:00Z) - RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline.<n>RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components.<n>Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z) - G-OSR: A Comprehensive Benchmark for Graph Open-Set Recognition [54.45837774534411]
We introduce textbfG-OSR, a benchmark for evaluating Graph Open-Set Recognition (GOSR) methods at both the node and graph levels.<n>Results offer critical insights into the generalizability and limitations of current GOSR methods.
arXiv Detail & Related papers (2025-03-01T13:02:47Z) - RAG vs. GraphRAG: A Systematic Evaluation and Key Insights [42.31801859160484]
We systematically evaluate Retrieval-Augmented Generation (RAG) and GraphRAG on text-based benchmarks.<n>Our results highlight the distinct strengths of RAG and GraphRAG across different tasks and evaluation perspectives.
arXiv Detail & Related papers (2025-02-17T02:36:30Z) - ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation [16.204046295248546]
Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models.<n>We introduce a novel graph-based RAG approach, called Attributed Community-based Hierarchical RAG (ArchRAG)<n>We build a novel hierarchical index structure for the attributed communities and develop an effective online retrieval method.
arXiv Detail & Related papers (2025-02-14T03:28:36Z) - GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation [84.41557981816077]
We introduce GFM-RAG, a novel graph foundation model (GFM) for retrieval augmented generation.<n>GFM-RAG is powered by an innovative graph neural network that reasons over graph structure to capture complex query-knowledge relationships.<n>It achieves state-of-the-art performance while maintaining efficiency and alignment with neural scaling laws.
arXiv Detail & Related papers (2025-02-03T07:04:29Z) - Retrieval-Augmented Generation with Graphs (GraphRAG) [84.29507404866257]
Retrieval-augmented generation (RAG) is a powerful technique that enhances downstream task execution by retrieving additional information.<n>Graph, by its intrinsic "nodes connected by edges" nature, encodes massive heterogeneous and relational information.<n>Unlike conventional RAG, the uniqueness of graph-structured data, such as diverse-formatted and domain-specific relational knowledge, poses unique and significant challenges when designing GraphRAG for different domains.
arXiv Detail & Related papers (2024-12-31T06:59:35Z) - LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration [17.514586423233872]
We propose LEGO-GraphRAG, a modular framework that enables fine-grained decomposition of the GraphRAG workflow.<n>Our framework facilitates comprehensive empirical studies of GraphRAG on large-scale real-world graphs and diverse query sets.
arXiv Detail & Related papers (2024-11-06T15:32:28Z) - Graph Retrieval-Augmented Generation: A Survey [28.979898837538958]
Retrieval-Augmented Generation (RAG) has achieved remarkable success in addressing the challenges of Large Language Models (LLMs) without necessitating retraining.
This paper provides the first comprehensive overview of GraphRAG methodologies.
We formalize the GraphRAG workflow, encompassing Graph-Based Indexing, Graph-Guided Retrieval, and Graph-Enhanced Generation.
arXiv Detail & Related papers (2024-08-15T12:20:24Z) - Hi-GMAE: Hierarchical Graph Masked Autoencoders [90.30572554544385]
Hierarchical Graph Masked AutoEncoders (Hi-GMAE)
Hi-GMAE is a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs.
Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.
arXiv Detail & Related papers (2024-05-17T09:08:37Z) - Explainable Sparse Knowledge Graph Completion via High-order Graph
Reasoning Network [111.67744771462873]
This paper proposes a novel explainable model for sparse Knowledge Graphs (KGs)
It combines high-order reasoning into a graph convolutional network, namely HoGRN.
It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability.
arXiv Detail & Related papers (2022-07-14T10:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.