Related papers: Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

URL: http://arxiv.org/abs/2212.01539v1
Date: Sat, 3 Dec 2022 05:20:15 GMT
Title: Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian
Abstract summary: We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
Score: 91.60608388479645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $\epsilon=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.

Related papers

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
We introduce PaLoRA, a novel parameter-efficient method that addresses multi-task trade-offs in machine learning. Our experiments show that PaLoRA outperforms state-of-the-art MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z)
On the accuracy and efficiency of group-wise clipping in differentially private optimization [38.80873569002277]
We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off. We demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models.
arXiv Detail & Related papers (2023-10-30T01:01:15Z)
Evaluating Privacy Leakage in Split Learning [8.841387955312669]
On-device machine learning allows us to avoid sharing raw data with a third-party server during inference. Split Learning (SL) is a promising approach that can overcome limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device.
arXiv Detail & Related papers (2023-05-22T13:00:07Z)
Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy [10.098114696565865]
Large convolutional neural networks (CNN) can be difficult to train in the differentially private (DP) regime. We propose an efficient and scalable implementation of this clipping on convolutional layers, termed as the mixed ghost clipping. We achieve 96.7% accuracy on CIFAR10 and 83.0% on CIFAR100 at $epsilon=1$ using BEiT, while the previous best results are 94.8% and 67.4%, respectively.
arXiv Detail & Related papers (2022-05-21T22:01:12Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text. We show that this performance drop can be mitigated with the use of large pretrained models. We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration. This enables the learning of long-range dependencies beyond a single clip. Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z)
Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z)
Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping [15.410557873153833]
Recent work on Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, differentially private deep networks often lag far behind their non-private counterparts in accuracy. One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks.
arXiv Detail & Related papers (2020-09-07T13:51:26Z)
AutoClip: Adaptive Gradient Clipping for Source Separation Networks [45.58157519349822]
AutoClip is a method for automatically and adaptively choosing a gradient clipping threshold. Experiments show that applying AutoClip results in improved performance for audio source separation networks.
arXiv Detail & Related papers (2020-07-25T20:59:39Z)
Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.