Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs
- URL: http://arxiv.org/abs/2602.14795v1
- Date: Mon, 16 Feb 2026 14:42:14 GMT
- Title: Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs
- Authors: Ivan Diliso, Roberto Barile, Claudia d'Amato, Nicola Fanizzi,
- Abstract summary: We present the first resource that provides a workflow for extracting datasets including both schema and ground facts.<n>The resulting curated suite of datasets is ready for machine learning and reasoning services.<n>We provide utilities for loading datasets in tensor representations typical of standard machine learning libraries.
- Score: 0.017283310584905027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Datasets for the experimental evaluation of knowledge graph refinement algorithms typically contain only ground facts, retaining very limited schema level knowledge even when such information is available in the source knowledge graphs. This limits the evaluation of methods that rely on rich ontological constraints, reasoning or neurosymbolic techniques and ultimately prevents assessing their performance in large-scale, real-world knowledge graphs. In this paper, we present \resource{} the first resource that provides a workflow for extracting datasets including both schema and ground facts, ready for machine learning and reasoning services, along with the resulting curated suite of datasets. The workflow also handles inconsistencies detected when keeping both schema and facts and also leverage reasoning for entailing implicit knowledge. The suite includes newly extracted datasets from KGs with expressive schemas while simultaneously enriching existing datasets with schema information. Each dataset is serialized in OWL making it ready for reasoning services. Moreover, we provide utilities for loading datasets in tensor representations typical of standard machine learning libraries.
Related papers
- Towards a Gateway for Knowledge Graph Schemas Collection, Analysis, and
Embedding [10.19939896927137]
This paper describes the Live Semantic Web initiative, namely a first version of a gateway that has the main scope of leveraging the gold mine of relational data collected by many existing knowledge graphs.
arXiv Detail & Related papers (2023-11-21T09:22:02Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - GenIE: Generative Information Extraction [20.491645841368214]
We introduce GenIE, the first end-to-end autoregressive formulation of closed information extraction.
Our experiments show that GenIE is state-of-the-art on closed information extraction.
This work paves the way towards a unified end-to-end approach to the core tasks of information extraction.
arXiv Detail & Related papers (2021-12-15T18:45:14Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.