Related papers: NanoBatch DPSGD: Exploring Differentially Private learning on ImageNet with low batch sizes on the IPU

NanoBatch DPSGD: Exploring Differentially Private learning on ImageNet with low batch sizes on the IPU

URL: http://arxiv.org/abs/2109.12191v1
Date: Fri, 24 Sep 2021 20:59:04 GMT
Title: NanoBatch DPSGD: Exploring Differentially Private learning on ImageNet with low batch sizes on the IPU
Authors: Edward H. Lee and Mario Michael Krell and Alexander Tsyplikhin and Victoria Rege and Errol Colak and Kristen W. Yeom
Abstract summary: We show that low batch sizes using group normalization on ResNet-50 can yield high accuracy and privacy on Graphcore IPUs. This enables DPSGD training on ResNet-50 on ImageNet in just 6 hours on an IPU-POD16 system.
Score: 56.74644007407562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentially private SGD (DPSGD) has recently shown promise in deep learning. However, compared to non-private SGD, the DPSGD algorithm places computational overheads that can undo the benefit of batching in GPUs. Microbatching is a standard method to alleviate this and is fully supported in the TensorFlow Privacy library (TFDP). However, this technique, while improving training times also reduces the quality of the gradients and degrades the classification accuracy. Recent works that for example use the JAX framework show promise in also alleviating this but still show degradation in throughput from non-private to private SGD on CNNs, and have not yet shown ImageNet implementations. In our work, we argue that low batch sizes using group normalization on ResNet-50 can yield high accuracy and privacy on Graphcore IPUs. This enables DPSGD training of ResNet-50 on ImageNet in just 6 hours (100 epochs) on an IPU-POD16 system.

Related papers

DCT-CryptoNets: Scaling Private Inference in the Frequency Domain [8.084341432899954]
homomorphic encryption (FHE) and machine learning offer unprecedented opportunities for private inference of sensitive data. FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality. Existing FHE-based implementations for deep neural networks face challenges in computational cost, latency, and scalability. This paper introduces DCT-CryptoNets, a novel approach that leverages frequency-domain learning to tackle these issues.
arXiv Detail & Related papers (2024-08-27T17:48:29Z)
Sigmoid Loss for Language Image Pre-Training [93.91385557929604]
We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP) The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. Combined with Locked-image Tuning, with only four TPUv4 chips, we train a SigLiT model that achieves 84.5% ImageNet zero-shot accuracy in two days.
arXiv Detail & Related papers (2023-03-27T15:53:01Z)
Differentially Private Kernel Inducing Points using features from ScatterNets (DP-KIP-ScatterNet) for Privacy Preserving Data Distillation [5.041384008847852]
We introduce differentially private kernel inducing points (DP-KIP) for privacy-preserving data distillation. We find that KIP using infinitely-wide convolutional neural tangent kernels (conv-NTKs) performs better compared to KIP using fully-connected NTKs. We propose DP-KIP-ScatterNet, which uses the wavelet features from Scattering networks (ScatterNet) instead of those from conv-NTKs.
arXiv Detail & Related papers (2023-01-31T03:38:09Z)
TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently. We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z)
Dep-$L_0$: Improving $L_0$-based Network Sparsification via Dependency Modeling [6.081082481356211]
Training deep neural networks with an $L_0$ regularization is one of the prominent approaches for network pruning or sparsification. We show that this method performs inconsistently on large-scale learning tasks, such as ResNet50 on ImageNet. We propose a dependency modeling of binary gates, which can be modeled effectively as a multi-layer perceptron.
arXiv Detail & Related papers (2021-06-30T19:33:35Z)
Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z)
Fast and Memory Efficient Differentially Private-SGD via JL Projections [29.37156662314245]
DP-SGD is the only known algorithm for private training of large scale neural networks. We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
arXiv Detail & Related papers (2021-02-05T06:02:10Z)
Batch Group Normalization [45.03388237812212]
Batch Normalization (BN) performs well at medium and large batch sizes. BN saturates at small/extreme large batch sizes due to noisy/confused statistic calculation. BGN is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes.
arXiv Detail & Related papers (2020-12-04T18:57:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.