Enhancing Data Space Semantic Interoperability through Machine Learning:
a Visionary Perspective
- URL: http://arxiv.org/abs/2303.08932v1
- Date: Wed, 15 Mar 2023 20:57:31 GMT
- Title: Enhancing Data Space Semantic Interoperability through Machine Learning:
a Visionary Perspective
- Authors: Zeyd Boukhers and Christoph Lange and Oya Beyan
- Abstract summary: Our vision paper outlines a plan to improve the future of semantic interoperability in data spaces through the application of machine learning.
By leveraging the power of machine learning, we believe that semantic interoperability in data spaces can be significantly improved.
Our vision for the future of data spaces addresses the limitations of conventional data exchange and makes data more accessible and valuable for all members of the community.
- Score: 5.994412766684842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our vision paper outlines a plan to improve the future of semantic
interoperability in data spaces through the application of machine learning.
The use of data spaces, where data is exchanged among members in a
self-regulated environment, is becoming increasingly popular. However, the
current manual practices of managing metadata and vocabularies in these spaces
are time-consuming, prone to errors, and may not meet the needs of all
stakeholders. By leveraging the power of machine learning, we believe that
semantic interoperability in data spaces can be significantly improved. This
involves automatically generating and updating metadata, which results in a
more flexible vocabulary that can accommodate the diverse terminologies used by
different sub-communities. Our vision for the future of data spaces addresses
the limitations of conventional data exchange and makes data more accessible
and valuable for all members of the community.
Related papers
- Automated Annotation of Evolving Corpora for Augmenting Longitudinal Network Data: A Framework Integrating Large Language Models and Expert Knowledge [27.879485905967577]
This paper presents the Expert-Augmented LLM (EALA) approach, which leverages Large Language Models (LLMs) in combination with historically annotated data and expert-constructed codebooks to extrapolate and extend datasets into future periods.
Our findings demonstrate that EALA effectively predicts nuanced interactions between negotiation parties and captures the evolution of topics over time.
Given the wide availability of codebooks and annotated datasets, EALA holds substantial promise for advancing research in political science and beyond.
arXiv Detail & Related papers (2025-03-03T15:46:01Z) - From Open-Vocabulary to Vocabulary-Free Semantic Segmentation [78.62232202171919]
Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data.
Current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications.
This work proposes a Vocabulary-Free Semantic pipeline, eliminating the need for predefined class vocabularies.
arXiv Detail & Related papers (2025-02-17T15:17:08Z) - SPACE-IDEAS: A Dataset for Salient Information Detection in Space Innovation [0.3017070810884304]
We introduce SPACE-IDEAS, a dataset for salient information detection from innovation ideas related to the Space domain.
The text in SPACE-IDEAS varies greatly and includes informal, technical, academic and business-oriented writing styles.
In addition to a manually annotated dataset we release an extended version that is annotated using a large generative language model.
arXiv Detail & Related papers (2024-03-25T17:04:02Z) - Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
Self-Supervised Representation Mixing and Embedding Initialization [57.38123229553157]
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems.
We focus on achieving language adaptation using minimal labeled and unlabeled data.
Experimental results show that our framework is able to synthesize intelligible speech in unseen languages with only 4 utterances of labeled data and 15 minutes of unlabeled data.
arXiv Detail & Related papers (2024-01-23T21:55:34Z) - Augmented Datasheets for Speech Datasets and Ethical Decision-Making [2.7106766103546236]
Speech datasets are crucial for training Speech Language Technologies (SLT)
Lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products.
There is often a lack of oversight on the underlying training data with regard to the ethics of such data collection.
arXiv Detail & Related papers (2023-05-08T12:49:04Z) - A Data Fusion Framework for Multi-Domain Morality Learning [3.0671872389903547]
We describe a data fusion framework for training on multiple heterogeneous datasets.
The proposed framework achieves state-of-the-art performance in different datasets compared to prior works in morality inference.
arXiv Detail & Related papers (2023-04-04T22:05:02Z) - Label Name is Mantra: Unifying Point Cloud Segmentation across
Heterogeneous Datasets [17.503843467554592]
We propose a principled approach that supports learning from heterogeneous datasets with different label sets.
Our idea is to utilize a pre-trained language model to embed discrete labels to a continuous latent space with the help of their label names.
Our model outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2023-03-19T06:14:22Z) - A Vision for Semantically Enriched Data Science [19.604667287258724]
Key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation.
We envision how leveraging "semantic" understanding and reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2023-03-02T16:03:12Z) - Robotic Skill Acquisition via Instruction Augmentation with
Vision-Language Models [70.82705830137708]
We introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL)
We utilize semi-language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data.
DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.
arXiv Detail & Related papers (2022-11-21T18:56:00Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - DomainMix: Learning Generalizable Person Re-Identification Without Human
Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.