Equivariant Differentially Private Deep Learning: Why DP-SGD Needs
Sparser Models
- URL: http://arxiv.org/abs/2301.13104v2
- Date: Wed, 21 Jun 2023 12:03:57 GMT
- Title: Equivariant Differentially Private Deep Learning: Why DP-SGD Needs
Sparser Models
- Authors: Florian A. H\"olzl, Daniel Rueckert, Georgios Kaissis
- Abstract summary: We show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements.
Our results are a step towards efficient model architectures that make optimal use of their parameters.
- Score: 7.49320945341034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) limits the amount
of private information deep learning models can memorize during training. This
is achieved by clipping and adding noise to the model's gradients, and thus
networks with more parameters require proportionally stronger perturbation. As
a result, large models have difficulties learning useful information, rendering
training with DP-SGD exceedingly difficult on more challenging training tasks.
Recent research has focused on combating this challenge through training
adaptations such as heavy data augmentation and large batch sizes. However,
these techniques further increase the computational overhead of DP-SGD and
reduce its practical applicability. In this work, we propose using the
principle of sparse model design to solve precisely such complex tasks with
fewer parameters, higher accuracy, and in less time, thus serving as a
promising direction for DP-SGD. We achieve such sparsity by design by
introducing equivariant convolutional networks for model training with
Differential Privacy. Using equivariant networks, we show that small and
efficient architecture design can outperform current state-of-the-art models
with substantially lower computational requirements. On CIFAR-10, we achieve an
increase of up to $9\%$ in accuracy while reducing the computation time by more
than $85\%$. Our results are a step towards efficient model architectures that
make optimal use of their parameters and bridge the privacy-utility gap between
private and non-private deep learning for computer vision.
Related papers
- Zero redundancy distributed learning with differential privacy [26.89679585840689]
We develop a new systematic solution, DP-ZeRO, to scale up the trainable DP model size.
Our DP-ZeRO has the potential to train models with arbitrary size and is evaluated on the world's largest DP models.
arXiv Detail & Related papers (2023-11-20T14:58:56Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - SmoothNets: Optimizing CNN architecture design for differentially
private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients.
This introduces a reduction in model utility compared to non-private training.
We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - An Efficient DP-SGD Mechanism for Large Scale NLP Models [28.180412581994485]
Data used to train Natural Language Understanding (NLU) models may contain private information such as addresses or phone numbers.
It is desirable that underlying models do not expose private information contained in the training data.
Differentially Private Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models.
arXiv Detail & Related papers (2021-07-14T15:23:27Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.