From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development
- URL: http://arxiv.org/abs/2602.00699v1
- Date: Sat, 31 Jan 2026 12:50:23 GMT
- Title: From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development
- Authors: Xuan Liu, Ziyu Li, Mu He, Ziyang Ma, Xiaoxu Wu, Gizem Yilmaz, Yiyuan Xia, Bingbing Li, He Tan, Jerry Ying Hsi Fuh, Wen Feng Lu, Anders E. W. Jarfors, Per Jansson,
- Abstract summary: Ontologies are essential for structuring domain knowledge, improving accessibility, sharing, and reuse.<n>Traditional ontologies rely on manual annotation and conventional natural language processing (NLP) techniques.<n>The rise of Large Language Models (LLMs) offers new possibilities for automating knowledge extraction.<n>This study investigates three LLM-based approaches, including pre-trained LLM-driven method, in-context learning (ICL) method and fine-tuning method to extract terms and relations from domain-specific texts.
- Score: 14.475791894420666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ontologies are essential for structuring domain knowledge, improving accessibility, sharing, and reuse. However, traditional ontology construction relies on manual annotation and conventional natural language processing (NLP) techniques, making the process labour-intensive and costly, especially in specialised fields like casting manufacturing. The rise of Large Language Models (LLMs) offers new possibilities for automating knowledge extraction. This study investigates three LLM-based approaches, including pre-trained LLM-driven method, in-context learning (ICL) method and fine-tuning method to extract terms and relations from domain-specific texts using limited data. We compare their performances and use the best-performing method to build a casting ontology that validated by domian expert.
Related papers
- Integrating Domain Knowledge into Process Discovery Using Large Language Models [3.7448613209842967]
We propose an interactive framework that incorporates domain knowledge, expressed in natural language, into the process discovery pipeline.<n>The framework coordinates interactions among the Large Language Models (LLMs), domain experts, and a set of backend services.<n>Our empirical study includes a case study based on a real-life event log with the involvement of domain experts, who assessed the usability and effectiveness of the framework.
arXiv Detail & Related papers (2025-10-08T15:59:11Z) - LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards [0.0]
Software engineering standards (SES) consist of long, unstructured text (with high noise) and paragraphs with domain-specific terms.<n>This work proposes an open-source large language model (LLM)-assisted approach to RTE for SES.
arXiv Detail & Related papers (2025-08-29T17:14:54Z) - Post-Incorporating Code Structural Knowledge into Pretrained Models via ICL for Code Translation [15.73070332657629]
Large language models (LLMs) have achieved significant advancements in software mining.<n> handling the syntactic structure of source code remains a challenge.<n>This paper employs incontext learning (ICL) to integrate code structural knowledge into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-28T10:59:42Z) - Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them [9.952432291248954]
We investigate the use of LLM-generated data for continual pretraining of encoder models in domains with limited data.<n>We compile a benchmark specifically designed for assessing embedding model performance in invasion biology.<n>Our results demonstrate that this approach achieves a fully automated pipeline for enhancing domain-specific understanding of small encoder models.
arXiv Detail & Related papers (2025-03-27T21:51:24Z) - MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM)<n>MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task.<n>LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z) - NLP4PBM: A Systematic Review on Process Extraction using Natural Language Processing with Rule-based, Machine and Deep Learning Methods [0.0]
This literature review studies the field of automated process extraction, i.e., transforming textual descriptions into structured processes using Natural Language Processing (NLP)
We found that Machine Learning (ML) / Deep Learning (DL) methods are being increasingly used for the NLP component.
In some cases, they were chosen for their suitability towards process extraction, and results show that they can outperform classic rule-based methods.
arXiv Detail & Related papers (2024-09-10T15:16:02Z) - Structure-aware Domain Knowledge Injection for Large Language Models [38.08691252042949]
StructTuning is a methodology to transform Large Language Models (LLMs) into domain specialists.<n>It significantly reduces the training corpus needs to a mere 5% while achieving an impressive 100% of traditional knowledge injection performance.
arXiv Detail & Related papers (2024-07-23T12:38:48Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models.
One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets.
We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA)
Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Information Extraction in Low-Resource Scenarios: Survey and Perspective [56.5556523013924]
Information Extraction seeks to derive structured information from unstructured texts.
This paper presents a review of neural approaches to low-resource IE from emphtraditional and emphLLM-based perspectives.
arXiv Detail & Related papers (2022-02-16T13:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.