Large Scale Transfer Learning for Differentially Private Image
Classification
- URL: http://arxiv.org/abs/2205.02973v1
- Date: Fri, 6 May 2022 01:22:20 GMT
- Title: Large Scale Transfer Learning for Differentially Private Image
Classification
- Authors: Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky
- Abstract summary: Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
- Score: 51.10365553035979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differential Privacy (DP) provides a formal framework for training machine
learning models with individual example level privacy. Training models with DP
protects the model against leakage of sensitive data in a potentially
adversarial setting. In the field of deep learning, Differentially Private
Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training
algorithm. Private training using DP-SGD protects against leakage by injecting
noise into individual example gradients, such that the trained model weights
become nearly independent of the use any particular training example. While
this result is quite appealing, the computational cost of training large-scale
models with DP-SGD is substantially higher than non-private training. This is
further exacerbated by the fact that increasing the number of parameters leads
to larger degradation in utility with DP. In this work, we zoom in on the
ImageNet dataset and demonstrate that similar to the non-private case,
pre-training over-parameterized models on a large public dataset can lead to
substantial gains when the model is finetuned privately. Moreover, by
systematically comparing private and non-private models across a range of huge
batch sizes, we find that similar to non-private setting, choice of optimizer
can further improve performance substantially with DP. By switching from DP-SGD
to DP-LAMB we saw improvement of up to 20$\%$ points (absolute). Finally, we
show that finetuning just the last layer for a \emph{single step} in the full
batch setting leads to both SOTA results of 81.7 $\%$ under a wide privacy
budget range of $\epsilon \in [4, 10]$ and $\delta$ = $10^{-6}$ while
minimizing the computational overhead substantially.
Related papers
- Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - An Efficient DP-SGD Mechanism for Large Scale NLP Models [28.180412581994485]
Data used to train Natural Language Understanding (NLU) models may contain private information such as addresses or phone numbers.
It is desirable that underlying models do not expose private information contained in the training data.
Differentially Private Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models.
arXiv Detail & Related papers (2021-07-14T15:23:27Z) - DPlis: Boosting Utility of Differentially Private Deep Learning via
Randomized Smoothing [0.0]
We propose DPlis--Differentially Private Learning wIth Smoothing.
We show that DPlis can effectively boost model quality and training stability under a given privacy budget.
arXiv Detail & Related papers (2021-03-02T06:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.