Differentially Private Neural Tangent Kernels for Privacy-Preserving
Data Generation
- URL: http://arxiv.org/abs/2303.01687v2
- Date: Tue, 27 Feb 2024 22:01:30 GMT
- Title: Differentially Private Neural Tangent Kernels for Privacy-Preserving
Data Generation
- Authors: Yilin Yang, Kamil Adamczewski, Danica J. Sutherland, Xiaoxiao Li,
Mijung Park
- Abstract summary: This work considers the using the features of $textitneural tangent kernels (NTKs)$, more precisely $textitempirical$ NTKs (e-NTKs)
We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data.
- Score: 32.83436754714798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maximum mean discrepancy (MMD) is a particularly useful distance metric for
differentially private data generation: when used with finite-dimensional
features it allows us to summarize and privatize the data distribution once,
which we can repeatedly use during generator training without further privacy
loss. An important question in this framework is, then, what features are
useful to distinguish between real and synthetic data distributions, and
whether those enable us to generate quality synthetic data. This work considers
the using the features of $\textit{neural tangent kernels (NTKs)}$, more
precisely $\textit{empirical}$ NTKs (e-NTKs). We find that, perhaps
surprisingly, the expressiveness of the untrained e-NTK features is comparable
to that of the features taken from pre-trained perceptual features using public
data. As a result, our method improves the privacy-accuracy trade-off compared
to other state-of-the-art methods, without relying on any public data, as
demonstrated on several tabular and image benchmark datasets.
Related papers
- Private prediction for large-scale synthetic text generation [28.488459921169905]
We present an approach for generating differentially private synthetic text using large language models (LLMs)
In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees.
arXiv Detail & Related papers (2024-07-16T18:28:40Z) - Share Your Representation Only: Guaranteed Improvement of the
Privacy-Utility Tradeoff in Federated Learning [47.042811490685324]
Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free.
In this paper, we consider a representation learning objective that various parties collaboratively refine on a federated model, with differential privacy guarantees.
We observe a significant performance improvement over the prior work under the same small privacy budget.
arXiv Detail & Related papers (2023-09-11T14:46:55Z) - Differentially Private Synthetic Data Using KD-Trees [11.96971298978997]
We exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms.
We propose both data independent and data dependent algorithms for $epsilon$-differentially private synthetic data generation.
We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
arXiv Detail & Related papers (2023-06-19T17:08:32Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Anonymizing Sensor Data on the Edge: A Representation Learning and
Transformation Approach [4.920145245773581]
In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation.
We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data.
We show that it can anonymize data in real time on resource-constrained edge devices.
arXiv Detail & Related papers (2020-11-16T22:32:30Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z) - DP-MERF: Differentially Private Mean Embeddings with Random Features for
Practical Privacy-Preserving Data Generation [11.312036995195594]
We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings.
We exploit the random feature representations for two important benefits.
Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.
arXiv Detail & Related papers (2020-02-26T16:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.