SciMantify -- A Hybrid Approach for the Evolving Semantification of Scientific Knowledge
- URL: http://arxiv.org/abs/2506.21819v1
- Date: Mon, 14 Apr 2025 07:57:55 GMT
- Title: SciMantify -- A Hybrid Approach for the Evolving Semantification of Scientific Knowledge
- Authors: Lena John, Kheir Eddine Farfar, Sören Auer, Oliver Karras,
- Abstract summary: We propose an evolution model of knowledge representation, inspired by the 5-star Linked Open Data (LOD) model.<n>We develop a hybrid approach, called SciMantify, to support its evolving semantification.<n>We implement the approach in the Open Research Knowledge Graph (ORKG), an established platform for improving the findability, accessibility, interoperability, and reusability of scientific knowledge.
- Score: 0.4499833362998487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific publications, primarily digitized as PDFs, remain static and unstructured, limiting the accessibility and reusability of the contained knowledge. At best, scientific knowledge from publications is provided in tabular formats, which lack semantic context. A more flexible, structured, and semantic representation is needed to make scientific knowledge understandable and processable by both humans and machines. We propose an evolution model of knowledge representation, inspired by the 5-star Linked Open Data (LOD) model, with five stages and defined criteria to guide the stepwise transition from a digital artifact, such as a PDF, to a semantic representation integrated in a knowledge graph (KG). Based on an exemplary workflow implementing the entire model, we developed a hybrid approach, called SciMantify, leveraging tabular formats of scientific knowledge, e.g., results from secondary studies, to support its evolving semantification. In the approach, humans and machines collaborate closely by performing semantic annotation tasks (SATs) and refining the results to progressively improve the semantic representation of scientific knowledge. We implemented the approach in the Open Research Knowledge Graph (ORKG), an established platform for improving the findability, accessibility, interoperability, and reusability of scientific knowledge. A preliminary user experiment showed that the approach simplifies the preprocessing of scientific knowledge, reduces the effort for the evolving semantification, and enhances the knowledge representation through better alignment with the KG structures.
Related papers
- The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes [0.0]
We introduce the Discovery Engine, a framework to transform literature into a unified, computationally tractable representation of a scientific domain.<n>The Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
arXiv Detail & Related papers (2025-05-23T05:51:34Z) - Knowledge AI: Fine-tuning NLP Models for Facilitating Scientific Knowledge Extraction and Understanding [0.0]
This project investigates the efficacy of Large Language Models (LLMs) in understanding and extracting scientific knowledge across specific domains.
We employ pre-trained models and fine-tune them on datasets in the scientific domain.
arXiv Detail & Related papers (2024-08-04T01:32:09Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Large Language Models for Scientific Synthesis, Inference and
Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation.
We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature.
This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z) - Nougat: Neural Optical Understanding for Academic Documents [15.242993369368111]
We propose a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language.
The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age.
arXiv Detail & Related papers (2023-08-25T15:03:36Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - KnowledgeShovel: An AI-in-the-Loop Document Annotation System for
Scientific Knowledge Base Construction [46.56643271476249]
KnowledgeShovel is an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases.
The design of KnowledgeShovel introduces a multi-step multi-modalAI collaboration pipeline to improve data accuracy while reducing the human burden.
A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy.
arXiv Detail & Related papers (2022-10-06T11:38:18Z) - A Computational Inflection for Scientific Discovery [48.176406062568674]
We stand at the foot of a significant inflection in the trajectory of scientific discovery.
As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge.
Computer science is poised to ignite a revolution in the scientific process itself.
arXiv Detail & Related papers (2022-05-04T11:36:54Z) - Expressing High-Level Scientific Claims with Formal Semantics [0.8258451067861932]
We analyze the main claims from a sample of scientific articles from all disciplines.
We find that their semantics are more complex than what a straight-forward application of formalisms like RDF or OWL account for.
We show here how the instantiation of the five slots of this super-pattern leads to a strictly defined statement in higher-order logic.
arXiv Detail & Related papers (2021-09-27T09:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.