Choosing Public Datasets for Private Machine Learning via Gradient
Subspace Distance
- URL: http://arxiv.org/abs/2303.01256v1
- Date: Thu, 2 Mar 2023 13:36:28 GMT
- Title: Choosing Public Datasets for Private Machine Learning via Gradient
Subspace Distance
- Authors: Xin Gu, Gautam Kamath, Zhiwei Steven Wu
- Abstract summary: Differentially private gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters.
Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data.
We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples.
- Score: 35.653510597396114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private stochastic gradient descent privatizes model training
by injecting noise into each iteration, where the noise magnitude increases
with the number of model parameters. Recent works suggest that we can reduce
the noise by leveraging public data for private machine learning, by projecting
gradients onto a subspace prescribed by the public data. However, given a
choice of public datasets, it is not a priori clear which one may be most
appropriate for the private task. We give an algorithm for selecting a public
dataset by measuring a low-dimensional subspace distance between gradients of
the public and private examples. We provide theoretical analysis demonstrating
that the excess risk scales with this subspace distance. This distance is easy
to compute and robust to modifications in the setting. Empirical evaluation
shows that trained model accuracy is monotone in this distance.
Related papers
- Certification for Differentially Private Prediction in Gradient-Based Training [36.686002369773014]
We use convex relaxation and bound propagation to compute a provable upper-bound for the local and smooth sensitivity of a prediction.
This bound allows us to reduce the magnitude of noise added or improve privacy accounting in the private prediction setting.
arXiv Detail & Related papers (2024-06-19T10:47:00Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.