Graph Conditional Flow Matching for Relational Data Generation
- URL: http://arxiv.org/abs/2505.15668v1
- Date: Wed, 21 May 2025 15:45:15 GMT
- Title: Graph Conditional Flow Matching for Relational Data Generation
- Authors: Davide Scassola, Sebastiano Saccani, Luca Bortolussi,
- Abstract summary: We propose a generative model for relational data that generates the content of a relational dataset given the graph formed by the foreign-key relationships.<n>We do this by learning a deep generative model of the content of the whole relational database by flow matching.<n>Our method is flexible, as it can support relational datasets with complex structures, and expressive, as the generation of each record can be influenced by any other record within the same connected component.
- Score: 0.8823131482758475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data synthesis is gaining momentum as a privacy-enhancing technology. While single-table tabular data generation has seen considerable progress, current methods for multi-table data often lack the flexibility and expressiveness needed to capture complex relational structures. In particular, they struggle with long-range dependencies and complex foreign-key relationships, such as tables with multiple parent tables or multiple types of links between the same pair of tables. We propose a generative model for relational data that generates the content of a relational dataset given the graph formed by the foreign-key relationships. We do this by learning a deep generative model of the content of the whole relational database by flow matching, where the neural network trained to denoise records leverages a graph neural network to obtain information from connected records. Our method is flexible, as it can support relational datasets with complex structures, and expressive, as the generation of each record can be influenced by any other record within the same connected component. We evaluate our method on several benchmark datasets and show that it achieves state-of-the-art performance in terms of synthetic data fidelity.
Related papers
- Generating Synthetic Relational Tabular Data via Structural Causal Models [0.0]
We develop a novel framework that generates realistic synthetic relational data including causal relationships across tables.<n>Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.
arXiv Detail & Related papers (2025-07-04T12:27:23Z) - Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures [50.46688111973999]
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data.<n>We present a new blueprint that enables end-to-end representation of'relational entity graphs' without traditional engineering feature.<n>We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data.
arXiv Detail & Related papers (2025-06-19T23:51:38Z) - RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models [83.6013616017646]
RelDiff is a novel diffusion generative model that synthesizes complete relational databases by explicitly modeling their foreign key graph structure.<n>RelDiff consistently outperforms prior methods in producing realistic and coherent synthetic relational databases.
arXiv Detail & Related papers (2025-05-31T21:01:02Z) - Boosting Relational Deep Learning with Pretrained Tabular Models [18.34233986830027]
Graph Neural Networks (GNNs) offer a compelling alternative inherently by modeling these relationships.<n>Our framework achieves up to $33%$ performance improvement and a $526times$ inference speedup compared to GNNs.
arXiv Detail & Related papers (2025-04-07T11:19:04Z) - LLM-TabFlow: Synthetic Tabular Data Generation with Inter-column Logical Relationship Preservation [49.898152180805454]
This study is the first to explicitly address inter-column relationship preservation in synthetic tabular data generation.<n>LLM-TabFlow is a novel approach that captures complex inter-column relationships and compress data, while using Score-based Diffusion to model the distribution of the compressed data in latent space.<n>Our results show that LLM-TabFlow outperforms all baselines, fully preserving inter-column relationships while achieving the best balance between data fidelity, utility, and privacy.
arXiv Detail & Related papers (2025-03-04T00:47:52Z) - Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web [1.02801486034657]
We study dataset relationships from the perspective of users who discover, use, and share datasets on the Web.
We first present a comprehensive taxonomy of relationships between datasets on the Web and map these relationships to user tasks performed during dataset discovery.
We demonstrate that machine-learning based methods that use dataset metadata achieve multi-class classification accuracy of 90%.
arXiv Detail & Related papers (2024-08-26T21:00:25Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference,
Temporal, Causal, and Subevent Relation Extraction [78.61546292830081]
We construct a large-scale human-annotated ERE dataset MAVEN-ERE with improved annotation schemes.
It contains 103,193 event coreference chains, 1,216,217 temporal relations, 57,992 causal relations, and 15,841 subevent relations.
Experiments show that ERE on MAVEN-ERE is quite challenging, and considering relation interactions with joint learning can improve performances.
arXiv Detail & Related papers (2022-11-14T13:34:49Z) - ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy [65.5580334698777]
ViRel is a method for unsupervised discovery and learning of Visual Relations with graph-level analogy.
We show that our method achieves above 95% accuracy in relation classification.
We further generalizes to unseen tasks with more complicated relational structures.
arXiv Detail & Related papers (2022-07-04T16:56:45Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.