Differentially Private Synthetic Data Generation via
Lipschitz-Regularised Variational Autoencoders
- URL: http://arxiv.org/abs/2304.11336v2
- Date: Thu, 13 Jul 2023 06:57:02 GMT
- Title: Differentially Private Synthetic Data Generation via
Lipschitz-Regularised Variational Autoencoders
- Authors: Benedikt Gro{\ss}, Gerhard Wunder
- Abstract summary: It is often overlooked that generative models are prone to memorising many details of individual training records.
In this paper we explore an alternative approach for privately generating data that makes direct use of the inherentity in generative models.
- Score: 3.7463972693041274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic data has been hailed as the silver bullet for privacy preserving
data analysis. If a record is not real, then how could it violate a person's
privacy? In addition, deep-learning based generative models are employed
successfully to approximate complex high-dimensional distributions from data
and draw realistic samples from this learned distribution. It is often
overlooked though that generative models are prone to memorising many details
of individual training records and often generate synthetic data that too
closely resembles the underlying sensitive training data, hence violating
strong privacy regulations as, e.g., encountered in health care. Differential
privacy is the well-known state-of-the-art framework for guaranteeing
protection of sensitive individuals' data, allowing aggregate statistics and
even machine learning models to be released publicly without compromising
privacy. The training mechanisms however often add too much noise during the
training process, and thus severely compromise the utility of these private
models. Even worse, the tight privacy budgets do not allow for many training
epochs so that model quality cannot be properly controlled in practice. In this
paper we explore an alternative approach for privately generating data that
makes direct use of the inherent stochasticity in generative models, e.g.,
variational autoencoders. The main idea is to appropriately constrain the
continuity modulus of the deep models instead of adding another noise mechanism
on top. For this approach, we derive mathematically rigorous privacy guarantees
and illustrate its effectiveness with practical experiments.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - Privacy-Preserving Student Learning with Differentially Private Data-Free Distillation [35.37005050907983]
We present an effective teacher-student learning approach to train privacy-preserving deep learning models.
Massive synthetic data can be generated for model training without exposing data privacy.
A student is trained on the synthetic data with the supervision of private labels.
arXiv Detail & Related papers (2024-09-19T01:00:18Z) - Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation [24.868697898254368]
Deep models may pose a privacy leakage risk in practical deployment.
We propose a discriminative-generative distillation approach to learn privacy-preserving deep models.
Our approach can control query cost over private data and accuracy degradation in a unified manner.
arXiv Detail & Related papers (2024-09-04T03:06:13Z) - Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving
Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning.
We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - P3GM: Private High-Dimensional Data Release via Privacy Preserving
Phased Generative Model [23.91327154831855]
This paper proposes privacy-preserving phased generative model (P3GM) for releasing sensitive data.
P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency.
Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity.
arXiv Detail & Related papers (2020-06-22T09:47:54Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.