Scalable and Efficient Training of Large Convolutional Neural Networks
with Differential Privacy
- URL: http://arxiv.org/abs/2205.10683v1
- Date: Sat, 21 May 2022 22:01:12 GMT
- Title: Scalable and Efficient Training of Large Convolutional Neural Networks
with Differential Privacy
- Authors: Zhiqi Bu, Jialin Mao, Shiyun Xu
- Abstract summary: Large convolutional neural networks (CNN) can be difficult to train in the differentially private (DP) regime.
We propose an efficient and scalable implementation of this clipping on convolutional layers, termed as the mixed ghost clipping.
We achieve 96.7% accuracy on CIFAR10 and 83.0% on CIFAR100 at $epsilon=1$ using BEiT, while the previous best results are 94.8% and 67.4%, respectively.
- Score: 10.098114696565865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large convolutional neural networks (CNN) can be difficult to train in the
differentially private (DP) regime, since the optimization algorithms require a
computationally expensive operation, known as the per-sample gradient clipping.
We propose an efficient and scalable implementation of this clipping on
convolutional layers, termed as the mixed ghost clipping, that significantly
eases the private training in terms of both time and space complexities,
without affecting the accuracy. The improvement in efficiency is rigorously
studied through the first complexity analysis for the mixed ghost clipping and
existing DP training algorithms.
Extensive experiments on vision classification tasks, with large ResNet, VGG,
and Vision Transformers, demonstrate that DP training with mixed ghost clipping
adds $1\sim 10\%$ memory overhead and $<2\times$ slowdown to the standard
non-private training. Specifically, when training VGG19 on CIFAR10, the mixed
ghost clipping is $3\times$ faster than state-of-the-art Opacus library with
$18\times$ larger maximum batch size. To emphasize the significance of
efficient DP training on convolutional layers, we achieve 96.7\% accuracy on
CIFAR10 and 83.0\% on CIFAR100 at $\epsilon=1$ using BEiT, while the previous
best results are 94.8\% and 67.4\%, respectively. We open-source a privacy
engine (\url{https://github.com/JialinMao/private_CNN}) that implements DP
training of CNN with a few lines of code.
Related papers
- Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training.
Our approach creates a special structure with fewer unique elements in the gradient map.
Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z) - Exploring the Limits of Differentially Private Deep Learning with
Group-wise Clipping [91.60608388479645]
We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization.
This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
arXiv Detail & Related papers (2022-12-03T05:20:15Z) - Differentially Private Image Classification from Features [53.75086935617644]
Leveraging transfer learning has been shown to be an effective strategy for training large models with Differential Privacy.
Recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP.
arXiv Detail & Related papers (2022-11-24T04:04:20Z) - Improved techniques for deterministic l2 robustness [63.34032156196848]
Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_2$ norm is useful for adversarial robustness, interpretable gradients and stable training.
We introduce a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer.
We significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-11-15T19:10:12Z) - Differentially Private Deep Learning with ModelMix [14.445182641912014]
We propose a generic optimization framework, called em ModelMix, which performs random aggregation of intermediate model states.
It strengthens the composite privacy analysis utilizing the entropy of the training trajectory.
We present a formal study on the effect of gradient clipping in Differentially Private Gradient Descent.
arXiv Detail & Related papers (2022-10-07T22:59:00Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Differentially Private Optimization on Large Model at Small Cost [39.93710312222771]
Differentially private (DP) optimization is the standard paradigm to learn large neural networks that are accurate and privacy-preserving.
Existing DP implementations are 2-1000X more costly in time and space complexity than the standard (non-private) training.
We develop a novel Book-Keeping (BK) technique that implements existing DPs (thus achieving the same accuracy) with a substantial improvement on the computational cost.
arXiv Detail & Related papers (2022-09-30T18:38:53Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training.
We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.