TAN Without a Burn: Scaling Laws of DP-SGD
- URL: http://arxiv.org/abs/2210.03403v2
- Date: Wed, 24 May 2023 10:20:05 GMT
- Title: TAN Without a Burn: Scaling Laws of DP-SGD
- Authors: Tom Sander, Pierre Stock, Alexandre Sablayrolles
- Abstract summary: Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
- Score: 70.7364032297978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially Private methods for training Deep Neural Networks (DNNs) have
progressed recently, in particular with the use of massive batches and
aggregated data augmentations for a large number of training steps. These
techniques require much more computing resources than their non-private
counterparts, shifting the traditional privacy-accuracy trade-off to a
privacy-accuracy-compute trade-off and making hyper-parameter search virtually
impossible for realistic scenarios. In this work, we decouple privacy analysis
and experimental behavior of noisy training to explore the trade-off with
minimal computational requirements. We first use the tools of R\'enyi
Differential Privacy (RDP) to highlight that the privacy budget, when not
overcharged, only depends on the total amount of noise (TAN) injected
throughout training. We then derive scaling laws for training models with
DP-SGD to optimize hyper-parameters with more than a $100\times$ reduction in
computational budget. We apply the proposed method on CIFAR-10 and ImageNet
and, in particular, strongly improve the state-of-the-art on ImageNet with a +9
points gain in top-1 accuracy for a privacy budget epsilon=8.
Related papers
- Differentially Private Image Classification by Learning Priors from
Random Processes [48.0766422536737]
In privacy-preserving machine learning, differentially private gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition.
A recent focus in private learning research is improving the performance of DP-SGD on private data by incorporating priors that are learned on real-world public data.
In this work, we explore how we can improve the privacy-utility tradeoff of DP-SGD by learning priors from images generated by random processes and transferring these priors to private data.
arXiv Detail & Related papers (2023-06-08T04:14:32Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.