Related papers: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases

GFS: Graph-based Feature Synthesis for Prediction over Relational Databases

URL: http://arxiv.org/abs/2312.02037v1
Date: Mon, 4 Dec 2023 16:54:40 GMT
Title: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases
Authors: Han Zhang, Quan Gan, David Wipf, Weinan Zhang
Abstract summary: We propose a novel framework called Graph-based Feature Synthesis (GFS) GFS formulates relational database as a heterogeneous graph database. In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
Score: 39.975491511390985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Relational databases are extensively utilized in a variety of modern information system applications, and they always carry valuable data patterns. There are a huge number of data mining or machine learning tasks conducted on relational databases. However, it is worth noting that there are limited machine learning models specifically designed for relational databases, as most models are primarily tailored for single table settings. Consequently, the prevalent approach for training machine learning models on data stored in relational databases involves performing feature engineering to merge the data from multiple tables into a single table and subsequently applying single table models. This approach not only requires significant effort in feature engineering but also destroys the inherent relational structure present in the data. To address these challenges, we propose a novel framework called Graph-based Feature Synthesis (GFS). GFS formulates the relational database as a heterogeneous graph, thereby preserving the relational structure within the data. By leveraging the inductive bias from single table models, GFS effectively captures the intricate relationships inherent in each table. Additionally, the whole framework eliminates the need for manual feature engineering. In the extensive experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases, demonstrating its superior performance.

Related papers

Boosting Relational Deep Learning with Pretrained Tabular Models [18.34233986830027]
Graph Neural Networks (GNNs) offer a compelling alternative inherently by modeling these relationships. Our framework achieves up to $33%$ performance improvement and a $526times$ inference speedup compared to GNNs.
arXiv Detail & Related papers (2025-04-07T11:19:04Z)
LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation [49.898152180805454]
This study is the first to explicitly address inter-column relationship preservation in synthetic tabular data generation. LLM-TabFlow is a novel approach that captures complex inter-column relationships and compress data, while using Score-based Diffusion to model the distribution of the compressed data in latent space. Our results show that LLM-TabFlow outperforms all baselines, fully preserving inter-column relationships while achieving the best balance between data fidelity, utility, and privacy.
arXiv Detail & Related papers (2025-03-04T00:47:52Z)
Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models [62.47618742274461]
We fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs. We decouple the contributions of training data and the base model, providing insight into their individual impacts.
arXiv Detail & Related papers (2025-01-24T18:50:26Z)
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z)
RelBench: A Benchmark for Deep Learning on Relational Databases [78.52438155603781]
We present RelBench, a public benchmark for solving tasks over databases with graph neural networks. We use RelBench to conduct the first comprehensive study of Deep Learning infrastructure. RDL learns better whilst reducing human work needed by more than an order of magnitude.
arXiv Detail & Related papers (2024-07-29T14:46:13Z)
Differentially Private Synthetic Data Generation for Relational Databases [9.532509662034062]
We introduce the first-of-its-kind algorithm that can be combined with any existing differentially private (DP) synthetic data generation mechanisms. Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors.
arXiv Detail & Related papers (2024-05-29T00:25:07Z)
IRG: Generating Synthetic Relational Databases using Deep Learning with Insightful Relational Understanding [13.724085637262654]
We propose incremental generator (IRG) that successfully handles ubiquitous real-life situations. IRG ensures the preservation of relational schema integrity, offers a deep understanding of relationships beyond direct ancestors and descendants. Experiments on three open-source real-life relational datasets in different fields at different scales demonstrate IRG's advantage in maintaining the synthetic data's relational schema validity and data fidelity and utility.
arXiv Detail & Related papers (2023-12-23T07:47:58Z)
Relational Deep Learning: Graph Representation Learning on Relational Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z)
Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic. We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios. UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z)
Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases. The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z)
BERT Meets Relational DB: Contextual Representations of Relational Databases [4.029818252558553]
We address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables. We look into ways of using these attention-based model to learn embeddings for entities in the relational database.
arXiv Detail & Related papers (2021-04-30T11:23:26Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding. Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph. In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.