NanoBatch DPSGD: Exploring Differentially Private learning on ImageNet
with low batch sizes on the IPU
- URL: http://arxiv.org/abs/2109.12191v1
- Date: Fri, 24 Sep 2021 20:59:04 GMT
- Title: NanoBatch DPSGD: Exploring Differentially Private learning on ImageNet
with low batch sizes on the IPU
- Authors: Edward H. Lee and Mario Michael Krell and Alexander Tsyplikhin and
Victoria Rege and Errol Colak and Kristen W. Yeom
- Abstract summary: We show that low batch sizes using group normalization on ResNet-50 can yield high accuracy and privacy on Graphcore IPUs.
This enables DPSGD training on ResNet-50 on ImageNet in just 6 hours on an IPU-POD16 system.
- Score: 56.74644007407562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private SGD (DPSGD) has recently shown promise in deep
learning. However, compared to non-private SGD, the DPSGD algorithm places
computational overheads that can undo the benefit of batching in GPUs.
Microbatching is a standard method to alleviate this and is fully supported in
the TensorFlow Privacy library (TFDP). However, this technique, while improving
training times also reduces the quality of the gradients and degrades the
classification accuracy. Recent works that for example use the JAX framework
show promise in also alleviating this but still show degradation in throughput
from non-private to private SGD on CNNs, and have not yet shown ImageNet
implementations. In our work, we argue that low batch sizes using group
normalization on ResNet-50 can yield high accuracy and privacy on Graphcore
IPUs. This enables DPSGD training of ResNet-50 on ImageNet in just 6 hours (100
epochs) on an IPU-POD16 system.
Related papers
- DCT-CryptoNets: Scaling Private Inference in the Frequency Domain [8.084341432899954]
homomorphic encryption (FHE) and machine learning offer unprecedented opportunities for private inference of sensitive data.
FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality.
Existing FHE-based implementations for deep neural networks face challenges in computational cost, latency, and scalability.
This paper introduces DCT-CryptoNets, a novel approach that leverages frequency-domain learning to tackle these issues.
arXiv Detail & Related papers (2024-08-27T17:48:29Z) - Sigmoid Loss for Language Image Pre-Training [93.91385557929604]
We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP)
The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization.
Combined with Locked-image Tuning, with only four TPUv4 chips, we train a SigLiT model that achieves 84.5% ImageNet zero-shot accuracy in two days.
arXiv Detail & Related papers (2023-03-27T15:53:01Z) - Differentially Private Kernel Inducing Points using features from ScatterNets (DP-KIP-ScatterNet) for Privacy Preserving Data Distillation [5.041384008847852]
We introduce differentially private kernel inducing points (DP-KIP) for privacy-preserving data distillation.
We find that KIP using infinitely-wide convolutional neural tangent kernels (conv-NTKs) performs better compared to KIP using fully-connected NTKs.
We propose DP-KIP-ScatterNet, which uses the wavelet features from Scattering networks (ScatterNet) instead of those from conv-NTKs.
arXiv Detail & Related papers (2023-01-31T03:38:09Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z) - Dep-$L_0$: Improving $L_0$-based Network Sparsification via Dependency
Modeling [6.081082481356211]
Training deep neural networks with an $L_0$ regularization is one of the prominent approaches for network pruning or sparsification.
We show that this method performs inconsistently on large-scale learning tasks, such as ResNet50 on ImageNet.
We propose a dependency modeling of binary gates, which can be modeled effectively as a multi-layer perceptron.
arXiv Detail & Related papers (2021-06-30T19:33:35Z) - Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images.
When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z) - Fast and Memory Efficient Differentially Private-SGD via JL Projections [29.37156662314245]
DP-SGD is the only known algorithm for private training of large scale neural networks.
We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
arXiv Detail & Related papers (2021-02-05T06:02:10Z) - Batch Group Normalization [45.03388237812212]
Batch Normalization (BN) performs well at medium and large batch sizes.
BN saturates at small/extreme large batch sizes due to noisy/confused statistic calculation.
BGN is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes.
arXiv Detail & Related papers (2020-12-04T18:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.