Schema Generation for Large Knowledge Graphs Using Large Language Models
- URL: http://arxiv.org/abs/2506.04512v1
- Date: Wed, 04 Jun 2025 23:25:16 GMT
- Title: Schema Generation for Large Knowledge Graphs Using Large Language Models
- Authors: Bohui Zhang, Yuan He, Lydia Pintscher, Albert Meroño Peñuela, Elena Simperl,
- Abstract summary: We explore automatic schema generation using large language models (LLMs)<n>Our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.
- Score: 5.764388991407566
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Schemas are vital for ensuring data quality in the Semantic Web and natural language processing. Traditionally, their creation demands substantial involvement from knowledge engineers and domain experts. Leveraging the impressive capabilities of large language models (LLMs) in related tasks like ontology engineering, we explore automatic schema generation using LLMs. To bridge the resource gap, we introduce two datasets: YAGO Schema and Wikidata EntitySchema, along with evaluation metrics. The LLM-based pipelines effectively utilize local and global information from knowledge graphs (KGs) to generate validating schemas in Shape Expressions (ShEx). Experiments demonstrate LLMs' strong potential in producing high-quality ShEx schemas, paving the way for scalable, automated schema generation for large KGs. Furthermore, our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.
Related papers
- Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z) - AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora [51.77079220622184]
We present AutoKG, a framework for fully autonomous knowledge graph construction.<n>We leverage large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text.<n>We construct ATLAS (Automated Triple Linking And Induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges.
arXiv Detail & Related papers (2025-05-29T16:34:58Z) - LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models [0.22470290096767]
Traditional schema mining relies on semi-structured data, limiting scalability.<n>This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction.
arXiv Detail & Related papers (2025-04-01T13:03:33Z) - NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models [26.739650151993928]
Graphs are a fundamental data structure for representing relationships in real-world scenarios.
Applying Large Language Models (LLMs) to graph-related tasks poses significant challenges.
We introduce Node Tokenizer for Large Language Models (NT-LLM), a novel framework that efficiently encodes graph structures.
arXiv Detail & Related papers (2024-10-14T17:21:57Z) - All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks [51.19110891434727]
Large Language Models (LLMs) with pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data.
E-LLaGNN is a framework with an on-demand LLM service that enriches message passing procedure of graph learning by enhancing a limited fraction of nodes from the graph.
arXiv Detail & Related papers (2024-07-20T22:09:42Z) - LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling [10.907949155931474]
We introduce LangTopo, which aligns graph structure modeling with natural language understanding at the token level.
We demonstrate the effectiveness of our proposed method on multiple datasets.
arXiv Detail & Related papers (2024-06-19T06:20:22Z) - Exploring the Potential of Large Language Models in Graph Generation [51.046188600990014]
Graph generation requires large language models (LLMs) to generate graphs with given properties.
This paper explores the abilities of LLMs for graph generation with systematical task designs and experiments.
Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks.
arXiv Detail & Related papers (2024-03-21T12:37:54Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Exploring the Potential of Large Language Models (LLMs) in Learning on
Graphs [59.74814230246034]
Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities.
We investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors.
arXiv Detail & Related papers (2023-07-07T05:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.