Adapting Differentially Private Synthetic Data to Relational Databases
- URL: http://arxiv.org/abs/2405.18670v1
- Date: Wed, 29 May 2024 00:25:07 GMT
- Title: Adapting Differentially Private Synthetic Data to Relational Databases
- Authors: Kaveh Alimohammadi, Hao Wang, Ojas Gulati, Akash Srivastava, Navid Azizan,
- Abstract summary: We introduce the first-of-its-kind algorithm that can be combined with any existing differentially private (DP) synthetic data generation mechanisms.
Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors.
- Score: 9.532509662034062
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce the first-of-its-kind algorithm that can be combined with any existing DP mechanisms to generate synthetic relational databases. Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors in terms of low-order marginal distributions while maintaining referential integrity. Finally, we provide both DP and theoretical utility guarantees for our algorithm.
Related papers
- 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs [67.47600679176963]
RDBs store vast amounts of rich, informative data spread across interconnected tables.
The progress of predictive machine learning models falls behind advances in other domains such as computer vision or natural language processing.
We explore a class of baseline models predicated on converting multi-table datasets into graphs.
We assemble a diverse collection of large-scale RDB datasets and (ii) coincident predictive tasks.
arXiv Detail & Related papers (2024-04-28T15:04:54Z) - A Comparison of SynDiffix Multi-table versus Single-table Synthetic Data [0.7252027234425334]
SynDiffix is a new open-source tool for structured data synthesis.
It has anonymization features that allow it to generate multiple synthetic tables while maintaining strong anonymity.
This paper compares SynDiffix with 15 other synthetic data techniques using the SDNIST analysis framework.
arXiv Detail & Related papers (2024-03-13T12:26:50Z) - GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z) - Privately generating tabular data using language models [80.67328256105891]
Privately generating synthetic data from a table is an important brick of a privacy-first world.
We propose and investigate a simple approach of treating each row in a table as a sentence and training a language model with differential privacy.
arXiv Detail & Related papers (2023-06-07T21:53:14Z) - Statistical Theory of Differentially Private Marginal-based Data
Synthesis Algorithms [30.330715718619874]
Marginal-based methods achieve promising performance in the synthetic data competition hosted by the National Institute of Standards and Technology (NIST)
Despite its promising performance in practice, the statistical properties of marginal-based methods are rarely studied in the literature.
arXiv Detail & Related papers (2023-01-21T01:32:58Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - Row Conditional-TGAN for generating synthetic relational databases [0.0]
We propose the Row-Tabular Generative Adversarial Network (RC-TGAN) to support modeling and synthesizing relational databases.
The RC-TGAN models relationship information between tables by incorporating conditional data of parent rows into the design of the child table's GAN.
arXiv Detail & Related papers (2022-11-14T18:14:18Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.