AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
- URL: http://arxiv.org/abs/2505.23628v3
- Date: Fri, 01 Aug 2025 06:39:45 GMT
- Title: AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
- Authors: Jiaxin Bai, Wei Fan, Qi Hu, Qing Zong, Chunyang Li, Hong Ting Tsang, Hongyu Luo, Yauwai Yim, Haoyu Huang, Xiao Zhou, Feng Qin, Tianshi Zheng, Xi Peng, Xin Yao, Huiwen Yang, Leijie Wu, Yi Ji, Gong Zhang, Renhai Chen, Yangqiu Song,
- Abstract summary: We present AutoKG, a framework for fully autonomous knowledge graph construction.<n>We leverage large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text.<n>We construct ATLAS (Automated Triple Linking And Induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges.
- Score: 51.77079220622184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text, modeling both entities and events while employing conceptualization to organize instances into semantic categories. Processing over 50 million documents, we construct ATLAS (Automated Triple Linking And Schema induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges. This approach outperforms state-of-the-art baselines on multi-hop QA tasks and enhances LLM factuality. Notably, our schema induction achieves 92\% semantic alignment with human-crafted schemas with zero manual intervention, demonstrating that billion-scale knowledge graphs with dynamically induced schemas can effectively complement parametric knowledge in large language models.
Related papers
- Schema Generation for Large Knowledge Graphs Using Large Language Models [5.764388991407566]
We explore automatic schema generation using large language models (LLMs)<n>Our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.
arXiv Detail & Related papers (2025-06-04T23:25:16Z) - LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models [0.22470290096767]
Traditional schema mining relies on semi-structured data, limiting scalability.<n>This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction.
arXiv Detail & Related papers (2025-04-01T13:03:33Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - A Hierarchical Language Model For Interpretable Graph Reasoning [47.460255447561906]
We introduce Hierarchical Language Model for Graph (HLM-G), which employs a two-block architecture to capture node-centric local information and interaction-centric global structure.
The proposed scheme allows LLMs to address various graph queries with high efficacy, efficiency, and robustness, while reducing computational costs on large-scale graph tasks.
Comprehensive evaluations across diverse graph reasoning and real-world tasks of node, link, and graph-levels highlight the superiority of our method.
arXiv Detail & Related papers (2024-10-29T00:28:02Z) - LLM-Based Multi-Agent Systems are Scalable Graph Generative Models [73.28294528654885]
GraphAgent-Generator (GAG) is a novel simulation-based framework for dynamic, text-attributed social graph generation.<n>GAG simulates the temporal node and edge generation processes for zero-shot social graph generation.<n>The resulting graphs exhibit adherence to seven key macroscopic network properties, achieving an 11% improvement in microscopic graph structure metrics.
arXiv Detail & Related papers (2024-10-13T12:57:08Z) - Open-Domain Hierarchical Event Schema Induction by Incremental Prompting
and Verification [81.17473088621209]
We treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs)
We design an incremental prompting and verification method to break down the construction of a complex event graph into three stages.
Compared to directly using LLMs to generate a linearized graph, our method can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations.
arXiv Detail & Related papers (2023-07-05T01:00:44Z) - Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema.
We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z) - KGLM: Integrating Knowledge Graph Structure in Language Models for Link
Prediction [0.0]
We introduce a new entity/relation embedding layer that learns to differentiate distinctive entity and relation types.
We show that further pre-training the language models with this additional embedding layer using the triples extracted from the knowledge graph, followed by the standard fine-tuning phase sets a new state-of-the-art performance for the link prediction task on the benchmark datasets.
arXiv Detail & Related papers (2022-11-04T20:38:12Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.