Transforming Expert Knowledge into Scalable Ontology via Large Language Models
- URL: http://arxiv.org/abs/2506.08422v2
- Date: Wed, 11 Jun 2025 03:16:55 GMT
- Title: Transforming Expert Knowledge into Scalable Ontology via Large Language Models
- Authors: Ikkei Itoku, David Theil, Evelyn Eichelsdoerfer Uehara, Sreyoshi Bhaduri, Junnosuke Kuroda, Toshi Yumoto, Alex Gil, Natalie Perez, Rajesh Cherukuri, Naumaan Nayyar,
- Abstract summary: Traditional approaches to taxonomy alignment rely on expert review of concept pairs.<n>We propose a novel framework that combines large language models (LLMs) with expert calibration and iterative prompt optimization.<n>In evaluating our framework on a domain-specific mapping task of concept essentiality, we achieved an F1-score of 0.97, substantially exceeding the human benchmark of 0.68.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Having a unified, coherent taxonomy is essential for effective knowledge representation in domain-specific applications as diverse terminologies need to be mapped to underlying concepts. Traditional manual approaches to taxonomy alignment rely on expert review of concept pairs, but this becomes prohibitively expensive and time-consuming at scale, while subjective interpretations often lead to expert disagreements. Existing automated methods for taxonomy alignment have shown promise but face limitations in handling nuanced semantic relationships and maintaining consistency across different domains. These approaches often struggle with context-dependent concept mappings and lack transparent reasoning processes. We propose a novel framework that combines large language models (LLMs) with expert calibration and iterative prompt optimization to automate taxonomy alignment. Our method integrates expert-labeled examples, multi-stage prompt engineering, and human validation to guide LLMs in generating both taxonomy linkages and supporting rationales. In evaluating our framework on a domain-specific mapping task of concept essentiality, we achieved an F1-score of 0.97, substantially exceeding the human benchmark of 0.68. These results demonstrate the effectiveness of our approach in scaling taxonomy alignment while maintaining high-quality mappings and preserving expert oversight for ambiguous cases.
Related papers
- Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks [81.44256822500257]
RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences.<n> RLHF exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks.<n>We propose a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
arXiv Detail & Related papers (2025-05-19T08:33:11Z) - Enforcing Consistency and Fairness in Multi-level Hierarchical Classification with a Mask-based Output Layer [25.819440955594736]
We introduce a fair, model-agnostic layer designed to enforce taxonomy and optimize objectives, including consistency, fairness, and exact match.<n>Our evaluations demonstrate that the proposed layer not only improves the fairness of predictions but also enforces the taxonomy, resulting in consistent predictions and superior performance.
arXiv Detail & Related papers (2025-03-19T06:30:04Z) - Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark [63.97125827026949]
This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts.<n>A benchmark is proposed that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images.<n>The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback.
arXiv Detail & Related papers (2025-03-13T13:37:54Z) - ExpertGenQA: Open-ended QA generation in Specialized Domains [9.412082058055823]
ExpertGenQA is a protocol that combines few-shot learning with structured topic and style categorization to generate comprehensive domain-specific QA pairs.<n>We show that ExpertGenQA achieves twice the efficiency of baseline few-shot approaches while maintaining $94.4%$ topic coverage.
arXiv Detail & Related papers (2025-03-04T19:09:48Z) - CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts [40.52605902842168]
Taxonomies play a crucial role in various applications by providing a structural representation of knowledge.<n>Previous approaches typically relied on self-supervised methods that generate annotation data from existing taxonomy.<n>We introduce CodeTaxo, a novel approach that leverages large language models through code language prompts to capture the taxonomic structure.
arXiv Detail & Related papers (2024-08-17T02:15:07Z) - Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings [0.0]
This paper improves upon the vectors embeddings of scientific text by assembling niche datasets using co-citations as a similarity metric.<n>We apply a novel Mixture of Experts (MoE) extension pipeline to pretrained BERT models, where every multi-layer perceptron section is enlarged and copied into multiple distinct experts.
arXiv Detail & Related papers (2024-01-28T17:34:42Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Towards Feasible Counterfactual Explanations: A Taxonomy Guided
Template-based NLG Method [0.5003525838309206]
Counterfactual Explanations (cf-XAI) describe the smallest changes in feature values necessary to change an outcome from one class to another.
Many cf-XAI methods neglect the feasibility of those changes.
We introduce a novel approach for presenting cf-XAI in natural language (Natural-XAI)
arXiv Detail & Related papers (2023-10-03T12:48:57Z) - Prompting or Fine-tuning? A Comparative Study of Large Language Models
for Taxonomy Construction [0.8670827427401335]
We present a general framework for taxonomy construction that takes into account structural constraints.
We compare the prompting and fine-tuning approaches performed on a hypernym taxonomy and a novel computer science taxonomy dataset.
arXiv Detail & Related papers (2023-09-04T16:53:17Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - Evaluation of semantic relations impact in query expansion-based
retrieval systems [0.29008108937701327]
This paper generates resources using the labels of a given taxonomy as source of information.
The obtained resources are integrated into a plain classifier for reformulating a set of input queries as intents.
The evaluation employs a wide and varied taxonomy as a use-case, exploiting its labels as basis for the semantic expansion.
arXiv Detail & Related papers (2022-03-30T12:06:32Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.