Research Knowledge Graphs in NFDI4DataScience: Key Activities, Achievements, and Future Directions
- URL: http://arxiv.org/abs/2508.02300v1
- Date: Mon, 04 Aug 2025 11:11:51 GMT
- Title: Research Knowledge Graphs in NFDI4DataScience: Key Activities, Achievements, and Future Directions
- Authors: Kanishka Silva, Marcel R. Ackermann, Heike Fliegl, Genet-Asefa Gesese, Fidan Limani, Philipp Mayr, Peter Mutschke, Allard Oelen, Muhammad Asif Suryani, Sharmila Upadhyaya, Benjamin Zapilko, Harald Sack, Stefan Dietze,
- Abstract summary: NFDI4DataScience is developing and providing Research Knowledge Graphs (RKGs)<n>RKGs aim to capture and connect complex datasets, models, software, and scientific publications.
- Score: 4.258678191793365
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As research in Artificial Intelligence and Data Science continues to grow in volume and complexity, it becomes increasingly difficult to ensure transparency, reproducibility, and discoverability. To address these challenges, as research artifacts should be understandable and usable by machines, the NFDI4DataScience consortium is developing and providing Research Knowledge Graphs (RKGs). Building upon earlier works, this paper presents recent progress in creating semantically rich RKGs using standardized ontologies, shared vocabularies, and automated Information Extraction techniques. Key achievements include the development of the NFDI4DS ontology, metadata standards, tools, and services designed to support the FAIR principles, as well as community-led projects and various implementations of RKGs. Together, these efforts aim to capture and connect the complex relationships between datasets, models, software, and scientific publications.
Related papers
- ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research [15.983924435685553]
We develop ScIRGen, a dataset generation framework for scientific QA & retrieval.<n>We use it to create a large-scale scientific retrieval-augmented generation (RAG) dataset with realistic queries, datasets and papers.
arXiv Detail & Related papers (2025-06-09T11:47:13Z) - Research Knowledge Graphs: the Shifting Paradigm of Scholarly Information Representation [2.967893090870586]
Research Knowledge Graphs (RKGs) aim at providing an easy to use and machine-actionable representation of research artifacts and their relations.<n>This paper provides the first conceptualisation of the RKG vision, a categorisation of in-use RKGs together with a description of RKG building blocks and principles.
arXiv Detail & Related papers (2025-06-08T21:10:30Z) - Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review [0.0]
This paper presents a comprehensive synthesis of major breakthroughs in artificial intelligence (AI) over the past fifteen years.<n>It identifies key inflection points in AI' s evolution by tracing the convergence of computational resources, data access, and algorithmic innovation.
arXiv Detail & Related papers (2025-05-22T15:12:48Z) - Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models [8.299006259255572]
We propose Synthetic-on-Graph (SoG), a synthetic data generation framework that incorporates cross-document knowledge associations for efficient corpus expansion.<n>SoG constructs a context graph by extracting entities and concepts from the original corpus, representing cross-document associations.<n>To further improve synthetic data quality, we integrate Chain-of-Thought (CoT) and Contrastive Clarifying (CC) synthetic, enhancing reasoning processes and discriminative power.
arXiv Detail & Related papers (2025-05-02T03:40:39Z) - Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey [69.0648659029394]
Spatio-Temporal (ST) data science is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation.<n>Researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks.<n>STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach.
arXiv Detail & Related papers (2025-03-12T09:42:18Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model [16.03026839786526]
This article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques integrated with large language models.<n>MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology.<n>By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods.
arXiv Detail & Related papers (2024-04-03T21:46:14Z) - Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected.
However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data.
This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Challenges in biomarker discovery and biorepository for Gulf-war-disease
studies: a novel data platform solution [48.7576911714538]
We introduce a novel data platform, named ROSALIND, to overcome the challenges, foster healthy and vital collaborations and advance scientific inquiries.
We follow the principles etched in the platform name - ROSALIND stands for resource organisms with self-governed accessibility, linkability, integrability, neutrality, and dependability.
The deployment of ROSALIND in our GWI study in recent 12 months has accelerated the pace of data experiment and analysis, removed numerous error sources, and increased research quality and productivity.
arXiv Detail & Related papers (2021-02-04T20:38:30Z) - Graph signal processing for machine learning: A review and new
perspectives [57.285378618394624]
We review a few important contributions made by GSP concepts and tools, such as graph filters and transforms, to the development of novel machine learning algorithms.
We discuss exploiting data structure and relational priors, improving data and computational efficiency, and enhancing model interpretability.
We provide new perspectives on future development of GSP techniques that may serve as a bridge between applied mathematics and signal processing on one side, and machine learning and network science on the other.
arXiv Detail & Related papers (2020-07-31T13:21:33Z) - Deep Learning for Community Detection: Progress, Challenges and
Opportunities [79.26787486888549]
Article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
This article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
arXiv Detail & Related papers (2020-05-17T11:22:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.