Stratified Data Integration
- URL: http://arxiv.org/abs/2105.09432v1
- Date: Wed, 19 May 2021 23:14:41 GMT
- Title: Stratified Data Integration
- Authors: Fausto Giunchiglia, Alessio Zamboni, Mayukh Bagchi and Simone Bocca
- Abstract summary: We state the problem of semantic heterogeneity as a problem of Representation Diversity.
We describe the proposed stratified representation of data and the process by which data are first transformed into the target representation.
The proposed framework has been evaluated in various pilot case studies and in a number of industrial data integration problems.
- Score: 3.8902657229395907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach to the problem of semantic heterogeneity where
data are organized into a set of stratified and independent representation
layers, namely: conceptual(where a set of unique alinguistic identifiers are
connected inside a graph codifying their meaning), language(where sets of
synonyms, possibly from multiple languages, annotate concepts), knowledge(in
the form of a graph where nodes are entity types and links are properties), and
data(in the form of a graph of entities populating the previous knowledge
graph). This allows us to state the problem of semantic heterogeneity as a
problem of Representation Diversity where the different types of heterogeneity,
viz. Conceptual, Language, Knowledge, and Data, are uniformly dealt within each
single layer, independently from the others. In this paper we describe the
proposed stratified representation of data and the process by which data are
first transformed into the target representation, then suitably integrated and
then, finally, presented to the user in her preferred format. The proposed
framework has been evaluated in various pilot case studies and in a number of
industrial data integration problems.
Related papers
- Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data [49.77103348208835]
We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution through a weighted sum of their Laplacians.
We propose a framework to infer the graph dictionary representation from observed data, along with a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem.
We exploit graph-dictionary representations in a motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods.
arXiv Detail & Related papers (2024-11-08T17:40:43Z) - Federated Graph Semantic and Structural Learning [54.97668931176513]
This paper reveals that local client distortion is brought by both node-level semantics and graph-level structure.
We postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships.
We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model.
arXiv Detail & Related papers (2024-06-27T07:08:28Z) - Disentangled Hyperbolic Representation Learning for Heterogeneous Graphs [29.065531121422204]
We propose $textDis-H2textGCN$, a Disentangled Hyperbolic Heterogeneous Graph Convolutional Network.
We evaluate our proposed $textDis-H2textGCN$ on five real-world heterogeneous graph datasets.
arXiv Detail & Related papers (2024-06-14T18:50:47Z) - Comparing the information content of probabilistic representation spaces [3.7277730514654555]
Probabilistic representation spaces convey information about a dataset, and to understand the effects of factors such as training loss and network architecture, we seek to compare the information content of such spaces.
Here, instead of building upon point-based measures of comparison, we build upon classic methods from literature on hard clustering.
We propose a practical method of estimation that is based on fingerprinting a representation space with a sample of the dataset and is applicable when the communicated information is only a handful of bits.
arXiv Detail & Related papers (2024-05-31T17:33:07Z) - Flexible inference in heterogeneous and attributed multilayer networks [21.349513661012498]
We develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information.
We demonstrate its ability to unveil a variety of patterns in a social support network among villagers in rural India.
arXiv Detail & Related papers (2024-05-31T15:21:59Z) - A Novel Multidimensional Reference Model For Heterogeneous Textual
Datasets Using Context, Semantic And Syntactic Clues [4.453735522794044]
This study aims to produce a novel multidimensional reference model using categories for heterogeneous datasets.
The main contribution of MRM is that it checks each tokens with each term based on indexing of linguistic categories such as synonym, antonym, formal, lexical word order and co-occurrence.
arXiv Detail & Related papers (2023-11-10T17:02:25Z) - KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot
Node Classification [75.95647590619929]
Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis.
We propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics.
A novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation.
arXiv Detail & Related papers (2023-08-15T02:38:08Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - VAE-CE: Visual Contrastive Explanation using Disentangled VAEs [3.5027291542274357]
Variational Autoencoder-based Contrastive Explanation (VAE-CE)
We build the model using a disentangled VAE, extended with a new supervised method for disentangling individual dimensions.
An analysis on synthetic data and MNIST shows that the approaches to both disentanglement and explanation provide benefits over other methods.
arXiv Detail & Related papers (2021-08-20T13:15:24Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - DomainMix: Learning Generalizable Person Re-Identification Without Human
Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.