Related papers: Stratified Data Integration

Stratified Data Integration

URL: http://arxiv.org/abs/2105.09432v1
Date: Wed, 19 May 2021 23:14:41 GMT
Title: Stratified Data Integration
Authors: Fausto Giunchiglia, Alessio Zamboni, Mayukh Bagchi and Simone Bocca
Abstract summary: We state the problem of semantic heterogeneity as a problem of Representation Diversity. We describe the proposed stratified representation of data and the process by which data are first transformed into the target representation. The proposed framework has been evaluated in various pilot case studies and in a number of industrial data integration problems.
Score: 3.8902657229395907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph where nodes are entity types and links are properties), and data(in the form of a graph of entities populating the previous knowledge graph). This allows us to state the problem of semantic heterogeneity as a problem of Representation Diversity where the different types of heterogeneity, viz. Conceptual, Language, Knowledge, and Data, are uniformly dealt within each single layer, independently from the others. In this paper we describe the proposed stratified representation of data and the process by which data are first transformed into the target representation, then suitably integrated and then, finally, presented to the user in her preferred format. The proposed framework has been evaluated in various pilot case studies and in a number of industrial data integration problems.

Related papers

Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data [49.77103348208835]
We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution through a weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed data, along with a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. We exploit graph-dictionary representations in a motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods.
arXiv Detail & Related papers (2024-11-08T17:40:43Z)
Federated Graph Semantic and Structural Learning [54.97668931176513]
This paper reveals that local client distortion is brought by both node-level semantics and graph-level structure. We postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships. We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model.
arXiv Detail & Related papers (2024-06-27T07:08:28Z)
Disentangled Hyperbolic Representation Learning for Heterogeneous Graphs [29.065531121422204]
We propose $textDis-H2textGCN$, a Disentangled Hyperbolic Heterogeneous Graph Convolutional Network. We evaluate our proposed $textDis-H2textGCN$ on five real-world heterogeneous graph datasets.
arXiv Detail & Related papers (2024-06-14T18:50:47Z)
Learning Joint and Individual Structure in Network Data with Covariates [1.6874375111244329]
This work formulates a low-rank model that simultaneously captures joint and individual information in network data. We show that the method is able to consistently recover the joint and individual components under a general signal-plus-noise model. In particular, the application of the methodology to a food trade network yields joint and individual factors that explain the trading patterns.
arXiv Detail & Related papers (2024-06-13T03:10:56Z)
Comparing the information content of probabilistic representation spaces [3.7277730514654555]
Probabilistic representation spaces convey information about a dataset, and to understand the effects of factors such as training loss and network architecture, we seek to compare the information content of such spaces. Here, instead of building upon point-based measures of comparison, we build upon classic methods from literature on hard clustering. We propose a practical method of estimation that is based on fingerprinting a representation space with a sample of the dataset and is applicable when the communicated information is only a handful of bits.
arXiv Detail & Related papers (2024-05-31T17:33:07Z)
Flexible inference in heterogeneous and attributed multilayer networks [21.349513661012498]
We develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information. We demonstrate its ability to unveil a variety of patterns in a social support network among villagers in rural India.
arXiv Detail & Related papers (2024-05-31T15:21:59Z)
HiGPT: Heterogeneous Graph Language Model [27.390123898556805]
Heterogeneous graph learning aims to capture complex relationships and diverse semantics among entities in a heterogeneous graph. Existing frameworks for heterogeneous graph learning have limitations in generalizing across diverse heterogeneous graph datasets. We propose HiGPT, a general large graph model with Heterogeneous graph instruction-tuning paradigm.
arXiv Detail & Related papers (2024-02-25T08:07:22Z)
A Novel Multidimensional Reference Model For Heterogeneous Textual Datasets Using Context, Semantic And Syntactic Clues [4.453735522794044]
This study aims to produce a novel multidimensional reference model using categories for heterogeneous datasets. The main contribution of MRM is that it checks each tokens with each term based on indexing of linguistic categories such as synonym, antonym, formal, lexical word order and co-occurrence.
arXiv Detail & Related papers (2023-11-10T17:02:25Z)
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification [75.95647590619929]
Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. We propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics. A novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation.
arXiv Detail & Related papers (2023-08-15T02:38:08Z)
Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations. We study how data heterogeneity affects the representations of the globally aggregated models. We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z)
VAE-CE: Visual Contrastive Explanation using Disentangled VAEs [3.5027291542274357]
Variational Autoencoder-based Contrastive Explanation (VAE-CE) We build the model using a disentangled VAE, extended with a new supervised method for disentangling individual dimensions. An analysis on synthetic data and MNIST shows that the approaches to both disentanglement and explanation provide benefits over other methods.
arXiv Detail & Related papers (2021-08-20T13:15:24Z)
Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole. We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z)
DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model. In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets. Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.