GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases
- URL: http://arxiv.org/abs/2312.02037v1
- Date: Mon, 4 Dec 2023 16:54:40 GMT
- Title: GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases
- Authors: Han Zhang, Quan Gan, David Wipf, Weinan Zhang
- Abstract summary: We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
- Score: 39.975491511390985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relational databases are extensively utilized in a variety of modern
information system applications, and they always carry valuable data patterns.
There are a huge number of data mining or machine learning tasks conducted on
relational databases. However, it is worth noting that there are limited
machine learning models specifically designed for relational databases, as most
models are primarily tailored for single table settings. Consequently, the
prevalent approach for training machine learning models on data stored in
relational databases involves performing feature engineering to merge the data
from multiple tables into a single table and subsequently applying single table
models. This approach not only requires significant effort in feature
engineering but also destroys the inherent relational structure present in the
data. To address these challenges, we propose a novel framework called
Graph-based Feature Synthesis (GFS). GFS formulates the relational database as
a heterogeneous graph, thereby preserving the relational structure within the
data. By leveraging the inductive bias from single table models, GFS
effectively captures the intricate relationships inherent in each table.
Additionally, the whole framework eliminates the need for manual feature
engineering. In the extensive experiment over four real-world multi-table
relational databases, GFS outperforms previous methods designed for relational
databases, demonstrating its superior performance.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - RelBench: A Benchmark for Deep Learning on Relational Databases [78.52438155603781]
We present RelBench, a public benchmark for solving tasks over databases with graph neural networks.
We use RelBench to conduct the first comprehensive study of Deep Learning infrastructure.
RDL learns better whilst reducing human work needed by more than an order of magnitude.
arXiv Detail & Related papers (2024-07-29T14:46:13Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic.
We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios.
UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - BERT Meets Relational DB: Contextual Representations of Relational
Databases [4.029818252558553]
We address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables.
We look into ways of using these attention-based model to learn embeddings for entities in the relational database.
arXiv Detail & Related papers (2021-04-30T11:23:26Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.