Choosing Public Datasets for Private Machine Learning via Gradient
Subspace Distance
- URL: http://arxiv.org/abs/2303.01256v1
- Date: Thu, 2 Mar 2023 13:36:28 GMT
- Title: Choosing Public Datasets for Private Machine Learning via Gradient
Subspace Distance
- Authors: Xin Gu, Gautam Kamath, Zhiwei Steven Wu
- Abstract summary: Differentially private gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters.
Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data.
We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples.
- Score: 35.653510597396114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private stochastic gradient descent privatizes model training
by injecting noise into each iteration, where the noise magnitude increases
with the number of model parameters. Recent works suggest that we can reduce
the noise by leveraging public data for private machine learning, by projecting
gradients onto a subspace prescribed by the public data. However, given a
choice of public datasets, it is not a priori clear which one may be most
appropriate for the private task. We give an algorithm for selecting a public
dataset by measuring a low-dimensional subspace distance between gradients of
the public and private examples. We provide theoretical analysis demonstrating
that the excess risk scales with this subspace distance. This distance is easy
to compute and robust to modifications in the setting. Empirical evaluation
shows that trained model accuracy is monotone in this distance.
Related papers
- Differentially Private Gradient Flow based on the Sliced Wasserstein
Distance for Non-Parametric Generative Modeling [61.65137699747604]
We introduce a novel differentially private generative modeling approach based on parameter-free gradient flows in the space of probability measures.
Our experiments show that compared to a generator-based model, our proposed model can generate higher-fidelity data at a low privacy budget.
arXiv Detail & Related papers (2023-12-13T15:47:30Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.