OAEI Machine Learning Dataset for Online Model Generation
- URL: http://arxiv.org/abs/2404.18542v1
- Date: Mon, 29 Apr 2024 09:33:53 GMT
- Title: OAEI Machine Learning Dataset for Online Model Generation
- Authors: Sven Hertling, Ebrahim Norouzi, Harald Sack,
- Abstract summary: Ontology and knowledge graph matching systems are evaluated annually by the Ontology Alignment Evaluation Initiative (OAEI)
We introduce a dataset that contains training, validation, and test sets for most of the OAEI tracks.
- Score: 0.6472397166280683
- License:
- Abstract: Ontology and knowledge graph matching systems are evaluated annually by the Ontology Alignment Evaluation Initiative (OAEI). More and more systems use machine learning-based approaches, including large language models. The training and validation datasets are usually determined by the system developer and often a subset of the reference alignments are used. This sampling is against the OAEI rules and makes a fair comparison impossible. Furthermore, those models are trained offline (a trained and optimized model is packaged into the matcher) and therefore the systems are specifically trained for those tasks. In this paper, we introduce a dataset that contains training, validation, and test sets for most of the OAEI tracks. Thus, online model learning (the systems must adapt to the given input alignment without human intervention) is made possible to enable a fair comparison for ML-based systems. We showcase the usefulness of the dataset by fine-tuning the confidence thresholds of popular systems.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Synthetic data generation for system identification: leveraging
knowledge transfer from similar systems [0.3749861135832073]
This paper introduces a novel approach for the generation of synthetic data, aimed at enhancing model generalization and robustness in scenarios characterized by data scarcity.
Synthetic data is generated through a pre-trained meta-model that describes a broad class of systems to which the system of interest is assumed to belong.
The efficacy of the approach is shown through a numerical example that highlights the advantages of integrating synthetic data into the system identification process.
arXiv Detail & Related papers (2024-03-08T09:09:15Z) - Deep active learning for nonlinear system identification [0.4485566425014746]
We develop a novel deep active learning acquisition scheme for nonlinear system identification.
Global exploration acquires a batch of initial states corresponding to the most informative state-action trajectories.
Local exploration solves an optimal control problem, finding the control trajectory that maximizes some measure of information.
arXiv Detail & Related papers (2023-02-24T14:46:36Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Manifold for Machine Learning Assurance [9.594432031144716]
We propose an analogous approach for machine-learning (ML) systems using an ML technique that extracts from the high-dimensional training data implicitly describing the required system.
It is then harnessed for a range of quality assurance tasks such as test adequacy measurement, test input generation, and runtime monitoring of the target ML system.
Preliminary experiments establish that the proposed manifold-based approach, for test adequacy drives diversity in test data, for test generation yields fault-revealing yet realistic test cases, and for runtime monitoring provides an independent means to assess trustability of the target system's output.
arXiv Detail & Related papers (2020-02-08T11:39:01Z) - From Learning to Meta-Learning: Reduced Training Overhead and Complexity
for Communication Systems [40.427909614453526]
Machine learning methods adapt the parameters of a model, constrained to lie in a given model class, by using a fixed learning procedure based on data or active observations.
With a meta-trained inductive bias, training of a machine learning model can be potentially carried out with reduced training data and/or time complexity.
This paper provides a high-level introduction to meta-learning with applications to communication systems.
arXiv Detail & Related papers (2020-01-05T12:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.