Related papers: Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

URL: http://arxiv.org/abs/2303.01687v2
Date: Tue, 27 Feb 2024 22:01:30 GMT
Title: Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation
Authors: Yilin Yang, Kamil Adamczewski, Danica J. Sutherland, Xiaoxiao Li, Mijung Park
Abstract summary: This work considers the using the features of $textitneural tangent kernels (NTKs)$, more precisely $textitempirical$ NTKs (e-NTKs) We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data.
Score: 32.83436754714798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers the using the features of $\textit{neural tangent kernels (NTKs)}$, more precisely $\textit{empirical}$ NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.

Related papers

Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Private prediction for large-scale synthetic text generation [28.488459921169905]
We present an approach for generating differentially private synthetic text using large language models (LLMs) In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees.
arXiv Detail & Related papers (2024-07-16T18:28:40Z)
Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning [47.042811490685324]
Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. In this paper, we consider a representation learning objective that various parties collaboratively refine on a federated model, with differential privacy guarantees. We observe a significant performance improvement over the prior work under the same small privacy budget.
arXiv Detail & Related papers (2023-09-11T14:46:55Z)
Differentially Private Synthetic Data Using KD-Trees [11.96971298978997]
We exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for $epsilon$-differentially private synthetic data generation. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
arXiv Detail & Related papers (2023-06-19T17:08:32Z)
Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z)
Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z)
Privacy for Free: How does Dataset Condensation Help Privacy? [21.418263507735684]
We identify that dataset condensation (DC) is also a better solution to replace the traditional data generators for private data generation. We empirically validate the visual privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks.
arXiv Detail & Related papers (2022-06-01T05:39:57Z)
Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
Anonymizing Sensor Data on the Edge: A Representation Learning and Transformation Approach [4.920145245773581]
In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data. We show that it can anonymize data in real time on resource-constrained edge devices.
arXiv Detail & Related papers (2020-11-16T22:32:30Z)
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data. We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation [11.312036995195594]
We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings. We exploit the random feature representations for two important benefits. Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.
arXiv Detail & Related papers (2020-02-26T16:41:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.