Related papers: Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

URL: http://arxiv.org/abs/2009.03106v1
Date: Mon, 7 Sep 2020 13:51:26 GMT
Title: Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping
Authors: Jaewoo Lee and Daniel Kifer
Abstract summary: Recent work on Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, differentially private deep networks often lag far behind their non-private counterparts in accuracy. One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks.
Score: 15.410557873153833
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work on Renyi Differential Privacy has shown the feasibility of applying differential privacy to deep learning tasks. Despite their promise, however, differentially private deep networks often lag far behind their non-private counterparts in accuracy, showing the need for more research in model architectures, optimizers, etc. One of the barriers to this expanded research is the training time -- often orders of magnitude larger than training non-private networks. The reason for this slowdown is a crucial privacy-related step called "per-example gradient clipping" whose naive implementation undoes the benefits of batch training with GPUs. By analyzing the back-propagation equations we derive new methods for per-example gradient clipping that are compatible with auto-differentiation (e.g., in PyTorch and TensorFlow) and provide better GPU utilization. Our implementation in PyTorch showed significant training speed-ups (by factors of 54x - 94x for training various models with batch sizes of 128). These techniques work for a variety of architectural choices including convolutional layers, recurrent networks, attention, residual blocks, etc.

Related papers

Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z)
Equivariant Differentially Private Deep Learning: Why DP-SGD Needs Sparser Models [7.49320945341034]
We show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements. Our results are a step towards efficient model architectures that make optimal use of their parameters.
arXiv Detail & Related papers (2023-01-30T17:43:47Z)
Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping [91.60608388479645]
We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
arXiv Detail & Related papers (2022-12-03T05:20:15Z)
Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy. We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
APP: Anytime Progressive Pruning [104.36308667437397]
We propose a novel way of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA) The proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training.
arXiv Detail & Related papers (2022-04-04T16:38:55Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
Differentially Private Deep Learning with Direct Feedback Alignment [15.410557873153833]
We propose the first differentially private method for training deep neural networks with direct feedback alignment (DFA) DFA achieves significant gains in accuracy (often by 10-20%) compared to backprop-based differentially private training on a variety of architectures.
arXiv Detail & Related papers (2020-10-08T00:25:22Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.