GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases
- URL: http://arxiv.org/abs/2312.02037v1
- Date: Mon, 4 Dec 2023 16:54:40 GMT
- Title: GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases
- Authors: Han Zhang, Quan Gan, David Wipf, Weinan Zhang
- Abstract summary: We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
- Score: 39.975491511390985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relational databases are extensively utilized in a variety of modern
information system applications, and they always carry valuable data patterns.
There are a huge number of data mining or machine learning tasks conducted on
relational databases. However, it is worth noting that there are limited
machine learning models specifically designed for relational databases, as most
models are primarily tailored for single table settings. Consequently, the
prevalent approach for training machine learning models on data stored in
relational databases involves performing feature engineering to merge the data
from multiple tables into a single table and subsequently applying single table
models. This approach not only requires significant effort in feature
engineering but also destroys the inherent relational structure present in the
data. To address these challenges, we propose a novel framework called
Graph-based Feature Synthesis (GFS). GFS formulates the relational database as
a heterogeneous graph, thereby preserving the relational structure within the
data. By leveraging the inductive bias from single table models, GFS
effectively captures the intricate relationships inherent in each table.
Additionally, the whole framework eliminates the need for manual feature
engineering. In the extensive experiment over four real-world multi-table
relational databases, GFS outperforms previous methods designed for relational
databases, demonstrating its superior performance.
Related papers
- Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models [62.47618742274461]
We fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets.
Our replication achieves performance on par with or surpassing existing table LLMs.
We decouple the contributions of training data and the base model, providing insight into their individual impacts.
arXiv Detail & Related papers (2025-01-24T18:50:26Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - RelBench: A Benchmark for Deep Learning on Relational Databases [78.52438155603781]
We present RelBench, a public benchmark for solving tasks over databases with graph neural networks.
We use RelBench to conduct the first comprehensive study of Deep Learning infrastructure.
RDL learns better whilst reducing human work needed by more than an order of magnitude.
arXiv Detail & Related papers (2024-07-29T14:46:13Z) - Differentially Private Synthetic Data Generation for Relational Databases [9.532509662034062]
We introduce the first-of-its-kind algorithm that can be combined with any existing differentially private (DP) synthetic data generation mechanisms.
Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors.
arXiv Detail & Related papers (2024-05-29T00:25:07Z) - IRG: Generating Synthetic Relational Databases using Deep Learning with Insightful Relational Understanding [13.724085637262654]
We propose incremental generator (IRG) that successfully handles ubiquitous real-life situations.
IRG ensures the preservation of relational schema integrity, offers a deep understanding of relationships beyond direct ancestors and descendants.
Experiments on three open-source real-life relational datasets in different fields at different scales demonstrate IRG's advantage in maintaining the synthetic data's relational schema validity and data fidelity and utility.
arXiv Detail & Related papers (2023-12-23T07:47:58Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - BERT Meets Relational DB: Contextual Representations of Relational
Databases [4.029818252558553]
We address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables.
We look into ways of using these attention-based model to learn embeddings for entities in the relational database.
arXiv Detail & Related papers (2021-04-30T11:23:26Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.