DTGAN: Differential Private Training for Tabular GANs
- URL: http://arxiv.org/abs/2107.02521v2
- Date: Thu, 8 Jul 2021 12:27:04 GMT
- Title: DTGAN: Differential Private Training for Tabular GANs
- Authors: Aditya Kunar, Robert Birke, Zilong Zhao, Lydia Chen
- Abstract summary: We propose DTGAN, a novel conditional Wasserstein GAN that comes in two variants DTGAN_G and DTGAN_D.
We rigorously evaluate the theoretical privacy guarantees offered by DP empirically against membership and attribute inference attacks.
Our results on 3 datasets show that the DP-SGD framework is superior to PATE and that a DP discriminator is more optimal for training convergence.
- Score: 6.174448419090292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular generative adversarial networks (TGAN) have recently emerged to cater
to the need of synthesizing tabular data -- the most widely used data format.
While synthetic tabular data offers the advantage of complying with privacy
regulations, there still exists a risk of privacy leakage via inference attacks
due to interpolating the properties of real data during training. Differential
private (DP) training algorithms provide theoretical guarantees for training
machine learning models by injecting statistical noise to prevent privacy
leaks. However, the challenges of applying DP on TGAN are to determine the most
optimal framework (i.e., PATE/DP-SGD) and neural network (i.e.,
Generator/Discriminator)to inject noise such that the data utility is well
maintained under a given privacy guarantee. In this paper, we propose DTGAN, a
novel conditional Wasserstein tabular GAN that comes in two variants DTGAN_G
and DTGAN_D, for providing a detailed comparison of tabular GANs trained using
DP-SGD for the generator vs discriminator, respectively. We elicit the privacy
analysis associated with training the generator with complex loss functions
(i.e., classification and information losses) needed for high quality tabular
data synthesis. Additionally, we rigorously evaluate the theoretical privacy
guarantees offered by DP empirically against membership and attribute inference
attacks. Our results on 3 datasets show that the DP-SGD framework is superior
to PATE and that a DP discriminator is more optimal for training convergence.
Thus, we find (i) DTGAN_D is capable of maintaining the highest data utility
across 4 ML models by up to 18% in terms of the average precision score for a
strict privacy budget, epsilon = 1, as compared to the prior studies and (ii)
DP effectively prevents privacy loss against inference attacks by restricting
the success probability of membership attacks to be close to 50%.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Closed-Form Bounds for DP-SGD against Record-level Inference [18.85865832127335]
We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds.
We obtain bounds for membership inference that match state-of-the-art techniques.
We present a novel data-dependent bound against attribute inference.
arXiv Detail & Related papers (2024-02-22T09:26:16Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - CTAB-GAN+: Enhancing Tabular Data Synthesis [11.813626861559904]
CTAB-GAN+ improves upon state-of-the-art GANs by adding downstream losses to conditional GANs for higher utility synthetic data domains.
We show CTAB-GAN+ synthesizes privacy-preserving data with at least 48.16% higher utility across multiple datasets and learning tasks under different privacy budgets.
arXiv Detail & Related papers (2022-04-01T12:52:30Z) - DP-SGD vs PATE: Which Has Less Disparate Impact on GANs? [0.0]
We compare GANs trained with the two best-known DP frameworks for deep learning, DP-SGD, and PATE, in different data imbalance settings.
Our experiments consistently show that for PATE, unlike DP-SGD, the privacy-utility trade-off is not monotonically decreasing.
arXiv Detail & Related papers (2021-11-26T17:25:46Z) - Effective and Privacy preserving Tabular Data Synthesizing [0.0]
We develop novel conditional table GAN architecture that can model diverse data types with complex distributions.
We train CTAB-GAN with strict privacy guarantees to ensure greater security for training GANs against malicious privacy attacks.
arXiv Detail & Related papers (2021-08-11T13:55:48Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.