Exploring the Limits of Differentially Private Deep Learning with
Group-wise Clipping
- URL: http://arxiv.org/abs/2212.01539v1
- Date: Sat, 3 Dec 2022 05:20:15 GMT
- Title: Exploring the Limits of Differentially Private Deep Learning with
Group-wise Clipping
- Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin
Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian
- Abstract summary: We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization.
This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
- Score: 91.60608388479645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private deep learning has recently witnessed advances in
computational efficiency and privacy-utility trade-off. We explore whether
further improvements along the two axes are possible and provide affirmative
answers leveraging two instantiations of \emph{group-wise clipping}. To reduce
the compute time overhead of private learning, we show that \emph{per-layer
clipping}, where the gradient of each neural network layer is clipped
separately, allows clipping to be performed in conjunction with backpropagation
in differentially private optimization. This results in private learning that
is as memory-efficient and almost as fast per training update as non-private
learning for many workflows of interest. While per-layer clipping with constant
thresholds tends to underperform standard flat clipping, per-layer clipping
with adaptive thresholds matches or outperforms flat clipping under given
training epoch constraints, hence attaining similar or better task performance
within less wall time. To explore the limits of scaling (pretrained) models in
differentially private deep learning, we privately fine-tune the 175
billion-parameter GPT-3. We bypass scaling challenges associated with clipping
gradients that are distributed across multiple devices with \emph{per-device
clipping} that clips the gradient of each model piece separately on its host
device. Privately fine-tuning GPT-3 with per-device clipping achieves a task
performance at $\epsilon=1$ better than what is attainable by non-privately
fine-tuning the largest GPT-2 on a summarization task.
Related papers
- On the accuracy and efficiency of group-wise clipping in differentially
private optimization [38.80873569002277]
We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off.
We demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models.
arXiv Detail & Related papers (2023-10-30T01:01:15Z) - Evaluating Privacy Leakage in Split Learning [8.841387955312669]
On-device machine learning allows us to avoid sharing raw data with a third-party server during inference.
Split Learning (SL) is a promising approach that can overcome limitations.
In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device.
arXiv Detail & Related papers (2023-05-22T13:00:07Z) - Scalable and Efficient Training of Large Convolutional Neural Networks
with Differential Privacy [10.098114696565865]
Large convolutional neural networks (CNN) can be difficult to train in the differentially private (DP) regime.
We propose an efficient and scalable implementation of this clipping on convolutional layers, termed as the mixed ghost clipping.
We achieve 96.7% accuracy on CIFAR10 and 83.0% on CIFAR100 at $epsilon=1$ using BEiT, while the previous best results are 94.8% and 67.4%, respectively.
arXiv Detail & Related papers (2022-05-21T22:01:12Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Scaling up Differentially Private Deep Learning with Fast Per-Example
Gradient Clipping [15.410557873153833]
Recent work on Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks.
Despite their promise, differentially private deep networks often lag far behind their non-private counterparts in accuracy.
One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks.
arXiv Detail & Related papers (2020-09-07T13:51:26Z) - AutoClip: Adaptive Gradient Clipping for Source Separation Networks [45.58157519349822]
AutoClip is a method for automatically and adaptively choosing a gradient clipping threshold.
Experiments show that applying AutoClip results in improved performance for audio source separation networks.
arXiv Detail & Related papers (2020-07-25T20:59:39Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.