Differentially Private Optimization on Large Model at Small Cost
- URL: http://arxiv.org/abs/2210.00038v2
- Date: Tue, 19 Sep 2023 02:14:06 GMT
- Title: Differentially Private Optimization on Large Model at Small Cost
- Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis
- Abstract summary: Differentially private (DP) optimization is the standard paradigm to learn large neural networks that are accurate and privacy-preserving.
Existing DP implementations are 2-1000X more costly in time and space complexity than the standard (non-private) training.
We develop a novel Book-Keeping (BK) technique that implements existing DPs (thus achieving the same accuracy) with a substantial improvement on the computational cost.
- Score: 39.93710312222771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private (DP) optimization is the standard paradigm to learn
large neural networks that are accurate and privacy-preserving. The
computational cost for DP deep learning, however, is notoriously heavy due to
the per-sample gradient clipping. Existing DP implementations are 2-1000X more
costly in time and space complexity than the standard (non-private) training.
In this work, we develop a novel Book-Keeping (BK) technique that implements
existing DP optimizers (thus achieving the same accuracy), with a substantial
improvement on the computational cost. Specifically, BK enables DP training on
large models and high dimensional data to be roughly as fast and memory-saving
as the standard training, whereas previous DP algorithms can be inefficient or
incapable of training due to memory error. The computational advantage of BK is
supported by the complexity analysis as well as extensive experiments on vision
and language tasks. Our implementation achieves state-of-the-art (SOTA)
accuracy with very small extra cost: on GPT2 and at almost the same memory cost
(<1% overhead), BK has 1.03X the time complexity of the standard training
(0.83X training speed in practice), and 0.61X the time complexity of the most
efficient DP implementation (1.36X training speed in practice). We open-source
the codebase for the BK algorithm at the FastDP library
(https://github.com/awslabs/fast-differential-privacy).
Related papers
- Towards Efficient and Scalable Training of Differentially Private Deep Learning [5.825410941577592]
Differentially private gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP)
Implementing computationally efficient DP-SGD with Poisson subsampling is not trivial, which leads to many implementations ignoring this requirement.
We conduct a comprehensive empirical study to quantify the computational cost of training deep learning models under DP.
We find that using the naive implementation DP-SGD with Opacus in PyTorch has between 2.6 and 8 times lower throughput of processed training examples per second than SGD.
arXiv Detail & Related papers (2024-06-25T06:04:58Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Zero redundancy distributed learning with differential privacy [26.89679585840689]
We develop a new systematic solution, DP-ZeRO, to scale up the trainable DP model size.
Our DP-ZeRO has the potential to train models with arbitrary size and is evaluated on the world's largest DP models.
arXiv Detail & Related papers (2023-11-20T14:58:56Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Differentially Private Bias-Term Fine-tuning of Foundation Models [36.55810474925956]
We study the problem of differentially private (DP) fine-tuning of large pre-trained models.
We propose DP-BiTFiT, which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard BiTFiT.
On a wide range of tasks, DP-BiTFiT is 230X faster and uses 28X less memory than DP full fine-tuning.
arXiv Detail & Related papers (2022-09-30T18:30:48Z) - Scalable and Efficient Training of Large Convolutional Neural Networks
with Differential Privacy [10.098114696565865]
Large convolutional neural networks (CNN) can be difficult to train in the differentially private (DP) regime.
We propose an efficient and scalable implementation of this clipping on convolutional layers, termed as the mixed ghost clipping.
We achieve 96.7% accuracy on CIFAR10 and 83.0% on CIFAR100 at $epsilon=1$ using BEiT, while the previous best results are 94.8% and 67.4%, respectively.
arXiv Detail & Related papers (2022-05-21T22:01:12Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Fast and Memory Efficient Differentially Private-SGD via JL Projections [29.37156662314245]
DP-SGD is the only known algorithm for private training of large scale neural networks.
We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
arXiv Detail & Related papers (2021-02-05T06:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.