DP-MERF: Differentially Private Mean Embeddings with Random Features for
Practical Privacy-Preserving Data Generation
- URL: http://arxiv.org/abs/2002.11603v5
- Date: Tue, 1 Jun 2021 14:38:20 GMT
- Title: DP-MERF: Differentially Private Mean Embeddings with Random Features for
Practical Privacy-Preserving Data Generation
- Authors: Frederik Harder, Kamil Adamczewski, Mijung Park
- Abstract summary: We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings.
We exploit the random feature representations for two important benefits.
Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.
- Score: 11.312036995195594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a differentially private data generation paradigm using random
feature representations of kernel mean embeddings when comparing the
distribution of true data with that of synthetic data. We exploit the random
feature representations for two important benefits. First, we require a minimal
privacy cost for training deep generative models. This is because unlike
kernel-based distance metrics that require computing the kernel matrix on all
pairs of true and synthetic data points, we can detach the data-dependent term
from the term solely dependent on synthetic data. Hence, we need to perturb the
data-dependent term only once and then use it repeatedly during the generator
training. Second, we can obtain an analytic sensitivity of the kernel mean
embedding as the random features are norm bounded by construction. This removes
the necessity of hyper-parameter search for a clipping norm to handle the
unknown sensitivity of a generator network. We provide several variants of our
algorithm, differentially-private mean embeddings with random features
(DP-MERF) to jointly generate labels and input features for datasets such as
heterogeneous tabular data and image data. Our algorithm achieves drastically
better privacy-utility trade-offs than existing methods when tested on several
datasets.
Related papers
- Private prediction for large-scale synthetic text generation [28.488459921169905]
We present an approach for generating differentially private synthetic text using large language models (LLMs)
In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees.
arXiv Detail & Related papers (2024-07-16T18:28:40Z) - Optimal Unbiased Randomizers for Regression with Label Differential
Privacy [61.63619647307816]
We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP)
We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets.
arXiv Detail & Related papers (2023-12-09T19:58:34Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Differentially Private Neural Tangent Kernels for Privacy-Preserving
Data Generation [32.83436754714798]
This work considers the using the features of $textitneural tangent kernels (NTKs)$, more precisely $textitempirical$ NTKs (e-NTKs)
We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data.
arXiv Detail & Related papers (2023-03-03T03:00:49Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type
Conscious Transformation [3.7315964084413173]
We propose an adversarial learning framework that deals with the privacy-utility tradeoff problem under two conditions.
Under data-type ignorant conditions, the privacy mechanism provides a one-hot encoding of categorical features, representing exactly one class.
Under data-type aware conditions, the categorical variables are represented by a collection of scores, one for each class.
arXiv Detail & Related papers (2022-05-04T08:40:15Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Polynomial magic! Hermite polynomials for private data generation [6.7386666699567845]
Kernel mean embedding considers infinite-dimensional features, which are challenging to handle in the context of differentially private data generation.
We propose to approximate the kernel mean embedding of data distribution using finite-dimensional random features, where sensitivity of the features becomes analytically tractable.
Unlike the random features, the Hermite features are ordered, where the low orders contain more information on the distribution than those at the high orders.
arXiv Detail & Related papers (2021-06-09T12:56:41Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.