Related papers: On Embeddings in Relational Databases

On Embeddings in Relational Databases

URL: http://arxiv.org/abs/2005.06437v1
Date: Wed, 13 May 2020 17:21:27 GMT
Title: On Embeddings in Relational Databases
Authors: Siddhant Arora, Srikanta Bedathur
Abstract summary: We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding. Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph. In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
Score: 11.52782249184251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding. Low-dimensional embeddings aim to encapsulate a concise vector representation for an underlying dataset with minimum loss of information. Embeddings across entities in a relational database have been less explored due to the intricate data relations and representation complexity involved. Relational databases are an inter-weaved collection of relations that not only model relationships between entities but also record complex domain-specific quantitative and temporal attributes of data defining complex relationships among entities. Recent methods for learning an embedding constitute of a naive approach to consider complete denormalization of the database by materializing the full join of all tables and representing as a knowledge graph. This popular approach has certain limitations as it fails to capture the inter-row relationships and additional semantics encoded in the relational databases. In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships. Empirical results over a real-world database with evaluations on similarity join and table completion tasks support our proposition.

Related papers

Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures [50.46688111973999]
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data.<n>We present a new blueprint that enables end-to-end representation of'relational entity graphs' without traditional engineering feature.<n>We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data.
arXiv Detail & Related papers (2025-06-19T23:51:38Z)
RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models [83.6013616017646]
RelDiff is a novel diffusion generative model that synthesizes complete relational databases by explicitly modeling their foreign key graph structure.<n>RelDiff consistently outperforms prior methods in producing realistic and coherent synthetic relational databases.
arXiv Detail & Related papers (2025-05-31T21:01:02Z)
Graph Conditional Flow Matching for Relational Data Generation [0.8823131482758475]
We propose a generative model for relational data that generates the content of a relational dataset given the graph formed by the foreign-key relationships.<n>We do this by learning a deep generative model of the content of the whole relational database by flow matching.<n>Our method is flexible, as it can support relational datasets with complex structures, and expressive, as the generation of each record can be influenced by any other record within the same connected component.
arXiv Detail & Related papers (2025-05-21T15:45:15Z)
LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation [49.898152180805454]
This study is the first to explicitly address inter-column relationship preservation in synthetic tabular data generation. LLM-TabFlow is a novel approach that captures complex inter-column relationships and compress data, while using Score-based Diffusion to model the distribution of the compressed data in latent space. Our results show that LLM-TabFlow outperforms all baselines, fully preserving inter-column relationships while achieving the best balance between data fidelity, utility, and privacy.
arXiv Detail & Related papers (2025-03-04T00:47:52Z)
Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web [1.02801486034657]
We study dataset relationships from the perspective of users who discover, use, and share datasets on the Web. We first present a comprehensive taxonomy of relationships between datasets on the Web and map these relationships to user tasks performed during dataset discovery. We demonstrate that machine-learning based methods that use dataset metadata achieve multi-class classification accuracy of 90%.
arXiv Detail & Related papers (2024-08-26T21:00:25Z)
Relational Deep Learning: Graph Representation Learning on Relational Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z)
GFS: Graph-based Feature Synthesis for Prediction over Relational Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS) GFS formulates relational database as a heterogeneous graph database. In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z)
Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z)
MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction [78.61546292830081]
We construct a large-scale human-annotated ERE dataset MAVEN-ERE with improved annotation schemes. It contains 103,193 event coreference chains, 1,216,217 temporal relations, 57,992 causal relations, and 15,841 subevent relations. Experiments show that ERE on MAVEN-ERE is quite challenging, and considering relation interactions with joint learning can improve performances.
arXiv Detail & Related papers (2022-11-14T13:34:49Z)
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences. Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
Learning Relation-Specific Representations for Few-shot Knowledge Graph Completion [24.880078645503417]
We propose a Relation-Specific Context Learning framework, which exploits graph contexts of triples to capture semantic information of relations and entities simultaneously. Experimental results on two public datasets demonstrate that RSCL outperforms state-of-the-art FKGC methods.
arXiv Detail & Related papers (2022-03-22T11:45:48Z)
BERT Meets Relational DB: Contextual Representations of Relational Databases [4.029818252558553]
We address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables. We look into ways of using these attention-based model to learn embeddings for entities in the relational database.
arXiv Detail & Related papers (2021-04-30T11:23:26Z)
Document-Level Relation Extraction with Reconstruction [28.593318203728963]
We propose a novel encoder-classifier-reconstructor model for document-level relation extraction (DocRE) The reconstructor reconstructs the ground-truth path dependencies from the graph representation, to ensure that the proposed DocRE model pays more attention to encode entity pairs with relationships in the training. Experimental results on a large-scale DocRE dataset show that the proposed model can significantly improve the accuracy of relation extraction on a strong heterogeneous graph-based baseline.
arXiv Detail & Related papers (2020-12-21T14:29:31Z)
HittER: Hierarchical Transformers for Knowledge Graph Embeddings [85.93509934018499]
We propose Hitt to learn representations of entities and relations in a complex knowledge graph. Experimental results show that Hitt achieves new state-of-the-art results on multiple link prediction. We additionally propose a simple approach to integrate Hitt into BERT and demonstrate its effectiveness on two Freebase factoid answering datasets.
arXiv Detail & Related papers (2020-08-28T18:58:15Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.