Related papers: LLM-Driven Ontology Construction for Enterprise Knowledge Graphs

LLM-Driven Ontology Construction for Enterprise Knowledge Graphs

URL: http://arxiv.org/abs/2602.01276v1
Date: Sun, 01 Feb 2026 15:13:30 GMT
Title: LLM-Driven Ontology Construction for Enterprise Knowledge Graphs
Authors: Abdulsobur Oyewale, Tommaso Soru,
Abstract summary: This paper introduces OntoEKG, a pipeline designed to accelerate the generation of domain-specific unstructured from enterprise data.<n>Our approach decomposes the modelling task into two distinct phases: an extraction module that identifies core classes and properties, and an entailment module that logically these elements into a hierarchy before serialising them into standard RDF.<n>Addressing the significant lack of comprehensive benchmarks for end-to-end construction, we adopt a new evaluation dataset derived from documents across the Data, Finance, and Logistics sectors.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enterprise Knowledge Graphs have become essential for unifying heterogeneous data and enforcing semantic governance. However, the construction of their underlying ontologies remains a resource-intensive, manual process that relies heavily on domain expertise. This paper introduces OntoEKG, a LLM-driven pipeline designed to accelerate the generation of domain-specific ontologies from unstructured enterprise data. Our approach decomposes the modelling task into two distinct phases: an extraction module that identifies core classes and properties, and an entailment module that logically structures these elements into a hierarchy before serialising them into standard RDF. Addressing the significant lack of comprehensive benchmarks for end-to-end ontology construction, we adopt a new evaluation dataset derived from documents across the Data, Finance, and Logistics sectors. Experimental results highlight both the potential and the challenges of this approach, achieving a fuzzy-match F1-score of 0.724 in the Data domain while revealing limitations in scope definition and hierarchical reasoning.

Related papers

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z)
Beyond Human Annotation: Recent Advances in Data Generation Methods for Document Intelligence [6.0051533428647375]
This survey establishes the first comprehensive technical map for data generation in Document Intelligence.<n>Data generation is redefined as supervisory signal production.<n>A novel taxonomy is introduced based on the "availability of data and labels"
arXiv Detail & Related papers (2026-01-18T09:01:18Z)
Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection [3.5554162308775408]
We propose Cognitive-YOLO, a novel framework for Large Language Models (LLMs)-driven architecture synthesis.<n>Our method consists of three stages: first, an analysis module extracts key meta-features from the target dataset.<n>Second, the LLM reasons upon these features, augmented with state-of-the-art components retrieved via Retrieval-Augmented Generation (RAG), to synthesize the architecture into a structured Neural Architecture Description Language (NADL)<n>Third, a compiler instantiates this description into a deployable model.
arXiv Detail & Related papers (2025-12-13T10:52:54Z)
Ontology-Based Knowledge Graph Framework for Industrial Standard Documents via Hierarchical and Propositional Structuring [8.759087891756069]
Ontology-based knowledge graph (KG) construction is a core technology that enables multidimensional understanding and advanced reasoning over domain knowledge.<n>In this study, we propose a method that organizes such documents into a hierarchical semantic structure.<n>Our approach captures both the hierarchical and logical structures of documents, effectively representing domain-specific semantics.
arXiv Detail & Related papers (2025-12-09T09:26:37Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
LLM/Agent-as-Data-Analyst: A Survey [54.08761322298559]
Large language models (LLMs) and agent techniques have brought a fundamental shift in the functionality and development paradigm of data analysis tasks.<n>LLMs enable complex data understanding, natural language, semantic analysis functions, and autonomous pipeline orchestration.
arXiv Detail & Related papers (2025-09-28T17:31:38Z)
From Parameters to Performance: A Data-Driven Study on LLM Structure and Development [73.67759647072519]
Large language models (LLMs) have achieved remarkable success across various domains.<n>Despite the rapid growth in model scale and capability, systematic, data-driven research on how structural configurations affect performance remains scarce.<n>We present a large-scale dataset encompassing diverse open-source LLM structures and their performance across multiple benchmarks.
arXiv Detail & Related papers (2025-09-14T12:20:39Z)
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [80.88654868264645]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
On Synthetic Data Strategies for Domain-Specific Generative Retrieval [23.906425329806456]
We study the data strategies for a two-stage training framework.<n>In the first stage, we learn to decode document identifiers from queries.<n>In the second stage, we refine document ranking through preference learning.
arXiv Detail & Related papers (2025-02-25T08:27:54Z)
How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Fine-tuning Large Enterprise Language Models via Ontological Reasoning [5.12835891233968]
Large Language Models (LLMs) exploit fine-tuning as a technique to adapt to diverse goals, thanks to task-specific training data. We propose a novel neurosymbolic architecture that leverages the power of ontological reasoning to build task- and domain-specific corpora for LLM fine-tuning.
arXiv Detail & Related papers (2023-06-19T06:48:45Z)
Unveiling the Potential of Structure-Preserving for Weakly Supervised Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL. In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network. In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.