A Hybrid AI Methodology for Generating Ontologies of Research Topics from Scientific Paper Corpora
- URL: http://arxiv.org/abs/2508.04213v1
- Date: Wed, 06 Aug 2025 08:48:14 GMT
- Title: A Hybrid AI Methodology for Generating Ontologies of Research Topics from Scientific Paper Corpora
- Authors: Alessia Pisu, Livio Pompianu, Francesco Osborne, Diego Reforgiato Recupero, Daniele Riboni, Angelo Salatino,
- Abstract summary: Sci-OG is a semi-auto-mated methodology for generating research topic.<n>This paper presents Sci-OG, a semi-auto-mated methodology for generating research topic.<n>We evaluate this approach against a range of alternative solutions using a dataset of 21,649 manually annotated semantic triples.
- Score: 6.384357773998868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Taxonomies and ontologies of research topics (e.g., MeSH, UMLS, CSO, NLM) play a central role in providing the primary framework through which intelligent systems can explore and interpret the literature. However, these resources have traditionally been manually curated, a process that is time-consuming, prone to obsolescence, and limited in granularity. This paper presents Sci-OG, a semi-auto\-mated methodology for generating research topic ontologies, employing a multi-step approach: 1) Topic Discovery, extracting potential topics from research papers; 2) Relationship Classification, determining semantic relationships between topic pairs; and 3) Ontology Construction, refining and organizing topics into a structured ontology. The relationship classification component, which constitutes the core of the system, integrates an encoder-based language model with features describing topic occurrence in the scientific literature. We evaluate this approach against a range of alternative solutions using a dataset of 21,649 manually annotated semantic triples. Our method achieves the highest F1 score (0.951), surpassing various competing approaches, including a fine-tuned SciBERT model and several LLM baselines, such as the fine-tuned GPT4-mini. Our work is corroborated by a use case which illustrates the practical application of our system to extend the CSO ontology in the area of cybersecurity. The presented solution is designed to improve the accessibility, organization, and analysis of scientific knowledge, thereby supporting advancements in AI-enabled literature management and research exploration.
Related papers
- A Vision for Auto Research with LLM Agents [46.95148319863236]
This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research.<n>The system spans all major research phases, including literature review, ideation, methodology, experimentation, paper writing, peer review response, and dissemination.
arXiv Detail & Related papers (2025-04-26T02:06:10Z) - Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey [64.08485471150486]
This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings.<n>We systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication.
arXiv Detail & Related papers (2025-03-28T14:08:40Z) - A Socratic RAG Approach to Connect Natural Language Queries on Research Topics with Knowledge Organization Systems [0.3782392304044599]
We propose a Retrieval Augmented Generation (RAG) agent that maps natural language queries about research topics to machine-interpretable semantic entities.<n>Our approach combines RAG with Socratic dialogue to align a user's intuitive understanding of research topics with established Knowledge Organization Systems.
arXiv Detail & Related papers (2025-02-20T19:58:59Z) - Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field [0.0]
This paper offers an analysis of the ability of large models to identify semantic relationships between different research topics.<n>We developed a gold standard based on the IEEE Thesaurus to evaluate the task.<n>Several models have achieved outstanding results, including Mixtral-8x7B, Dolphin-Mistral, and Claude 3-7B.
arXiv Detail & Related papers (2024-12-11T10:11:41Z) - Automating Intervention Discovery from Scientific Literature: A Progressive Ontology Prompting and Dual-LLM Framework [56.858564736806414]
This paper proposes a novel framework leveraging large language models (LLMs) to identify interventions in scientific literature.<n>Our approach successfully identified 2,421 interventions from a corpus of 64,177 research articles in the speech-language pathology domain.
arXiv Detail & Related papers (2024-08-20T16:42:23Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Onologies are widely used for representing domain knowledge and meta data.<n> logical reasoning that can directly support are quite limited in learning, approximation and prediction.<n>One straightforward solution is to integrate statistical analysis and machine learning.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - The CSO Classifier: Ontology-Driven Detection of Research Topics in
Scholarly Articles [0.0]
We present a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO)
The CSO takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology.
The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
arXiv Detail & Related papers (2021-04-02T09:02:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.