Effective and Privacy preserving Tabular Data Synthesizing
- URL: http://arxiv.org/abs/2108.10064v1
- Date: Wed, 11 Aug 2021 13:55:48 GMT
- Title: Effective and Privacy preserving Tabular Data Synthesizing
- Authors: Aditya Kunar
- Abstract summary: We develop novel conditional table GAN architecture that can model diverse data types with complex distributions.
We train CTAB-GAN with strict privacy guarantees to ensure greater security for training GANs against malicious privacy attacks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While data sharing is crucial for knowledge development, privacy concerns and
strict regulation (e.g., European General Data Protection Regulation (GDPR))
unfortunately limits its full effectiveness. Synthetic tabular data emerges as
an alternative to enable data sharing while fulfilling regulatory and privacy
constraints. The state-of-the-art tabular data synthesizers draw methodologies
from Generative Adversarial Networks (GAN). In this thesis, we develop
CTAB-GAN, a novel conditional table GAN architecture that can effectively model
diverse data types with complex distributions. CTAB-GAN is extensively
evaluated with the state of the art GANs that generate synthetic tables, in
terms of data similarity and analysis utility. The results on five datasets
show that the synthetic data of CTAB-GAN remarkably resembles the real data for
all three types of variables and results in higher accuracy for five machine
learning algorithms, by up to 17%.
Additionally, to ensure greater security for training tabular GANs against
malicious privacy attacks, differential privacy (DP) is studied and used to
train CTAB-GAN with strict privacy guarantees. DP-CTAB-GAN is rigorously
evaluated using state-of-the-art DP-tabular GANs in terms of data utility and
privacy robustness against membership and attribute inference attacks. Our
results on three datasets indicate that strict theoretical differential privacy
guarantees come only after severely affecting data utility. However, it is
shown empirically that these guarantees help provide a stronger defence against
privacy attacks. Overall, it is found that DP-CTABGAN is capable of being
robust to privacy attacks while maintaining the highest data utility as
compared to prior work, by up to 18% in terms of the average precision score.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Quantifying and Mitigating Privacy Risks for Tabular Generative Models [13.153278585144355]
Synthetic data from generative models emerges as the privacy-preserving data-sharing solution.
We propose DP-TLDM, Differentially Private Tabular Latent Diffusion Model.
We show that DP-TLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability.
arXiv Detail & Related papers (2024-03-12T17:27:49Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - CTAB-GAN+: Enhancing Tabular Data Synthesis [11.813626861559904]
CTAB-GAN+ improves upon state-of-the-art GANs by adding downstream losses to conditional GANs for higher utility synthetic data domains.
We show CTAB-GAN+ synthesizes privacy-preserving data with at least 48.16% higher utility across multiple datasets and learning tasks under different privacy budgets.
arXiv Detail & Related papers (2022-04-01T12:52:30Z) - DTGAN: Differential Private Training for Tabular GANs [6.174448419090292]
We propose DTGAN, a novel conditional Wasserstein GAN that comes in two variants DTGAN_G and DTGAN_D.
We rigorously evaluate the theoretical privacy guarantees offered by DP empirically against membership and attribute inference attacks.
Our results on 3 datasets show that the DP-SGD framework is superior to PATE and that a DP discriminator is more optimal for training convergence.
arXiv Detail & Related papers (2021-07-06T10:28:05Z) - CTAB-GAN: Effective Table Data Synthesizing [7.336728307626645]
We develop CTAB-GAN, a conditional table GAN architecture that can model diverse data types.
We show that CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up 17%.
arXiv Detail & Related papers (2021-02-16T18:53:57Z) - Synthesizing Property & Casualty Ratemaking Datasets using Generative
Adversarial Networks [2.2649197740853677]
We show how to design three types of generative adversarial networks (GANs) that can build a synthetic insurance dataset from a confidential original dataset.
For transparency, the approaches are illustrated using a public dataset, the French motor third party liability data.
We find that the MC-WGAN-GP synthesizes the best data, the CTGAN is the easiest to use, and the MNCDP-GAN guarantees differential privacy.
arXiv Detail & Related papers (2020-08-13T21:02:44Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.