Differentially Private Deep Learning with ModelMix
- URL: http://arxiv.org/abs/2210.03843v1
- Date: Fri, 7 Oct 2022 22:59:00 GMT
- Title: Differentially Private Deep Learning with ModelMix
- Authors: Hanshen Xiao, Jun Wan, and Srinivas Devadas
- Abstract summary: We propose a generic optimization framework, called em ModelMix, which performs random aggregation of intermediate model states.
It strengthens the composite privacy analysis utilizing the entropy of the training trajectory.
We present a formal study on the effect of gradient clipping in Differentially Private Gradient Descent.
- Score: 14.445182641912014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training large neural networks with meaningful/usable differential privacy
security guarantees is a demanding challenge. In this paper, we tackle this
problem by revisiting the two key operations in Differentially Private
Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient
clipping. We propose a generic optimization framework, called {\em ModelMix},
which performs random aggregation of intermediate model states. It strengthens
the composite privacy analysis utilizing the entropy of the training trajectory
and improves the $(\epsilon, \delta)$ DP security parameters by an order of
magnitude.
We provide rigorous analyses for both the utility guarantees and privacy
amplification of ModelMix. In particular, we present a formal study on the
effect of gradient clipping in DP-SGD, which provides theoretical instruction
on how hyper-parameters should be selected. We also introduce a refined
gradient clipping method, which can further sharpen the privacy loss in private
learning when combined with ModelMix.
Thorough experiments with significant privacy/utility improvement are
presented to support our theory. We train a Resnet-20 network on CIFAR10 with
$70.4\%$ accuracy via ModelMix given $(\epsilon=8, \delta=10^{-5})$ DP-budget,
compared to the same performance but with $(\epsilon=145.8,\delta=10^{-5})$
using regular DP-SGD; assisted with additional public low-dimensional gradient
embedding, one can further improve the accuracy to $79.1\%$ with
$(\epsilon=6.1, \delta=10^{-5})$ DP-budget, compared to the same performance
but with $(\epsilon=111.2, \delta=10^{-5})$ without ModelMix.
Related papers
- LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models [31.718398512438238]
We propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism.
It takes the first step to enable the tight composition of accurately fine-tuning language models with a sub-optimal DP mechanism.
LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees.
arXiv Detail & Related papers (2024-05-29T05:32:50Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Differentially Private Image Classification from Features [53.75086935617644]
Leveraging transfer learning has been shown to be an effective strategy for training large models with Differential Privacy.
Recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP.
arXiv Detail & Related papers (2022-11-24T04:04:20Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - The Fundamental Price of Secure Aggregation in Differentially Private
Federated Learning [34.630300910399036]
We characterize the fundamental communication cost required to obtain the best accuracy under $varepsilon$ central DP.
Our results show that $tildeOleft( min(n2varepsilon2, d) right)$ bits per client are both sufficient and necessary.
This provides a significant improvement relative to state-of-the-art SecAgg distributed DP schemes.
arXiv Detail & Related papers (2022-03-07T22:56:09Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.