Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs
- URL: http://arxiv.org/abs/2511.04473v1
- Date: Thu, 06 Nov 2025 15:45:18 GMT
- Title: Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs
- Authors: Alberto Cattaneo, Carlo Luschi, Daniel Justus,
- Abstract summary: SynthKGQA is a framework for generating high-quality synthetic Knowledge Graph Question Answering datasets from any Knowledge Graph.<n>We show how, in addition to enabling more informative benchmarking of KG retrievers, the data produced with SynthKGQA also allows us to train better models.
- Score: 3.222543736797976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of challenging QA datasets with ground-truth targets for graph retrieval. We present SynthKGQA, a framework for generating high-quality synthetic Knowledge Graph Question Answering datasets from any Knowledge Graph, providing the full set of ground-truth facts in the KG to reason over each question. We show how, in addition to enabling more informative benchmarking of KG retrievers, the data produced with SynthKGQA also allows us to train better models. We apply SynthKGQA to Wikidata to generate GTSQA, a new dataset designed to test zero-shot generalization abilities of KG retrievers with respect to unseen graph structures and relation types, and benchmark popular solutions for KG-augmented LLMs on it.
Related papers
- Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need? [57.28763506780752]
GraphFlow is a framework that efficiently retrieves accurate and diverse knowledge required for real-world queries from text-rich KGs.<n>It outperforms strong KG-RAG baselines, including GPT-4o, by 10% on average in hit rate and recall.<n>It also shows strong generalization to unseen KGs, demonstrating its effectiveness and robustness.
arXiv Detail & Related papers (2025-10-18T17:06:49Z) - Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z) - RAKG:Document-level Retrieval Augmented Knowledge Graph Construction [10.013667560362565]
This paper focuses on the task of automatic document-level knowledge graph construction.<n>It proposes the Document-level Retrieval Augmented Knowledge Graph Construction (RAKG) framework.
arXiv Detail & Related papers (2025-04-14T02:47:23Z) - Talking to GDELT Through Knowledge Graphs [0.6153162958674417]
We study various Retrieval Augmented Regeneration (RAG) approaches to gain an understanding of the strengths and weaknesses of each approach in a question-answering analysis.<n>To retrieve information from the text corpus we implement a traditional vector store RAG as well as state-of-the-art large language model (LLM) based approaches.
arXiv Detail & Related papers (2025-03-10T17:48:10Z) - Grounding LLM Reasoning with Knowledge Graphs [4.279373869671241]
We propose integrating reasoning strategies with Knowledge Graphs to anchor every step or "thought" of the reasoning chains in KG data.<n>We evaluate both agentic and automated search methods across several reasoning strategies, including Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT)<n>Our experiments demonstrate that this approach consistently outperforms baseline models.
arXiv Detail & Related papers (2025-02-18T19:20:46Z) - GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion [52.026016846945424]
We propose a new method called GLTW, which encodes the structural information of KGs and merges it with Large Language Models.<n>Specifically, we introduce an improved Graph Transformer (iGT) that effectively encodes subgraphs with both local and global structural information.<n>Also, we develop a subgraph-based multi-classification training objective, using all entities within KG as classification objects, to boost learning efficiency.
arXiv Detail & Related papers (2025-02-17T06:02:59Z) - Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base.<n>We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability.<n>Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z) - Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency [59.6772484292295]
Knowledge graphs (KGs) generated by large language models (LLMs) are increasingly valuable for Retrieval-Augmented Generation (RAG) applications.
Existing KG extraction methods rely on prompt-based approaches, which are inefficient for processing large-scale corpora.
We propose SynthKG, a multi-step, document-level synthesis KG workflow based on LLMs.
We also design a novel graph-based retrieval framework for RAG.
arXiv Detail & Related papers (2024-10-22T00:47:54Z) - Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering [87.67177556994525]
We propose a training-free method called Generate-on-Graph (GoG) to generate new factual triples while exploring Knowledge Graphs (KGs)
GoG performs reasoning through a Thinking-Searching-Generating framework, which treats LLM as both Agent and KG in IKGQA.
arXiv Detail & Related papers (2024-04-23T04:47:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.