Fast and Memory Efficient Differentially Private-SGD via JL Projections
- URL: http://arxiv.org/abs/2102.03013v1
- Date: Fri, 5 Feb 2021 06:02:10 GMT
- Title: Fast and Memory Efficient Differentially Private-SGD via JL Projections
- Authors: Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen
Shen, Uthaipon Tantipongpipat
- Abstract summary: DP-SGD is the only known algorithm for private training of large scale neural networks.
We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
- Score: 29.37156662314245
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations
are the only known algorithms for private training of large scale neural
networks. This algorithm requires computation of per-sample gradients norms
which is extremely slow and memory intensive in practice. In this paper, we
present a new framework to design differentially private optimizers called
DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson-Lindenstrauss (JL)
projections to quickly approximate the per-sample gradient norms without
exactly computing them, thus making the training time and memory requirements
of our optimizers closer to that of their non-DP versions.
Unlike previous attempts to make DP-SGD faster which work only on a subset of
network architectures or use compiler techniques, we propose an algorithmic
solution which works for any network in a black-box manner which is the main
contribution of this paper. To illustrate this, on IMDb dataset, we train a
Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff,
while being significantly faster than DP-SGD and with a similar memory
footprint as non-private SGD. The privacy analysis of our algorithms is more
involved than DP-SGD, we use the recently proposed f-DP framework of Dong et
al. (2019) to prove privacy.
Related papers
- DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.
To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - Towards Efficient and Scalable Training of Differentially Private Deep Learning [5.825410941577592]
Differentially private gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP)
Implementing computationally efficient DP-SGD with Poisson subsampling is not trivial, which leads to many implementations ignoring this requirement.
We conduct a comprehensive empirical study to quantify the computational cost of training deep learning models under DP.
We find that using the naive implementation DP-SGD with Opacus in PyTorch has between 2.6 and 8 times lower throughput of processed training examples per second than SGD.
arXiv Detail & Related papers (2024-06-25T06:04:58Z) - LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models [8.92538797216985]
We present our characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks.
We propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD.
Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119x training throughput improvement.
arXiv Detail & Related papers (2024-04-12T23:32:06Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - What You See is What You Get: Distributional Generalization for
Algorithm Design in Deep Learning [12.215964287323876]
We investigate and leverage a connection between Differential Privacy (DP) and the notion of Distributional Generalization (DG)
We introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard gradient descent (SGD)
arXiv Detail & Related papers (2022-04-07T05:41:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.