GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
- URL: http://arxiv.org/abs/2504.05478v1
- Date: Mon, 07 Apr 2025 20:16:22 GMT
- Title: GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
- Authors: Alfred Clemedtson, Borun Shi,
- Abstract summary: GraphRAFT is a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries.<n>Our method is the first such solution that can be taken off-the-shelf and used on Knowledge Graphs stored in native graph DBs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retrieves relevant data that fit into an LLM's context window and prompts the LLM for an answer. GraphRAG extends this approach to structured Knowledge Graphs (KGs) and questions regarding entities multiple hops away. The majority of recent GraphRAG methods either overlook the retrieval step or have ad hoc retrieval processes that are abstract or inefficient. This prevents them from being adopted when the KGs are stored in graph databases supporting graph query languages. In this work, we present GraphRAFT, a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries to retrieve high-quality subgraph contexts and produce accurate answers. Our method is the first such solution that can be taken off-the-shelf and used on KGs stored in native graph DBs. Benchmarks suggest that our method is sample-efficient and scales with the availability of training data. Our method achieves significantly better results than all state-of-the-art models across all four standard metrics on two challenging Q\&As on large text-attributed KGs.
Related papers
- Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs [22.218522445858344]
Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data.<n>We propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs -- DemoGraph.<n>Our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge.
arXiv Detail & Related papers (2025-02-19T09:00:32Z) - GraphSOS: Graph Sampling and Order Selection to Help LLMs Understand Graphs Better [13.742220809751627]
GraphSOS is a novel framework for converting graph data into natural language text.<n>It features an Order Selector Module to ensure proper serialization order of the graph and a Subgraph Sampling Module to sample subgraphs with better structure for better reasoning.<n> Experiments on multiple datasets for node classification and graph question-answering demonstrate that GraphSOS improves LLMs' performance and ability on graph tasks.
arXiv Detail & Related papers (2025-01-24T11:55:57Z) - TOBUGraph: Knowledge Graph-Based Retrieval for Enhanced LLM Performance Beyond RAG [3.8704987495086542]
TOBUGraph is a graph-based retrieval framework that first constructs the knowledge graph from unstructured data.
It extracts structured knowledge and diverse relationships among data, going beyond RAG's text-to-text similarity.
We demonstrate TOBUGraph's effectiveness in TOBU, a real-world application in production for personal memory organization and retrieval.
arXiv Detail & Related papers (2024-12-06T22:05:39Z) - Can LLMs be Good Graph Judger for Knowledge Graph Construction? [33.958327252291]
In this paper, we propose GraphJudger, a knowledge graph construction framework to address the aforementioned challenges.<n>We introduce three innovative modules in our method, which are entity-centric iterative text denoising, knowledge aware instruction tuning and graph judgement.<n> Experiments conducted on two general text-graph pair datasets and one domain-specific text-graph pair dataset show superior performances compared to baseline methods.
arXiv Detail & Related papers (2024-11-26T12:46:57Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [88.4320775961431]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.
Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.
We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z) - Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks.
Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora.
We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs.
We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs.
We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z) - GraphGPT: Graph Instruction Tuning for Large Language Models [27.036935149004726]
Graph Neural Networks (GNNs) have evolved to understand graph structures.
To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation.
Our research tackles this by advancing graph model generalization in zero-shot learning environments.
arXiv Detail & Related papers (2023-10-19T06:17:46Z) - Integrating Graphs with Large Language Models: Methods and Prospects [68.37584693537555]
Large language models (LLMs) have emerged as frontrunners, showcasing unparalleled prowess in diverse applications.
Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest.
This paper bifurcates such integrations into two predominant categories.
arXiv Detail & Related papers (2023-10-09T07:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.