Pre-Pruning and Gradient-Dropping Improve Differentially Private Image
Classification
- URL: http://arxiv.org/abs/2306.11754v1
- Date: Mon, 19 Jun 2023 14:35:28 GMT
- Title: Pre-Pruning and Gradient-Dropping Improve Differentially Private Image
Classification
- Authors: Kamil Adamczewski, Yingchen He, Mijung Park
- Abstract summary: We introduce a new training paradigm that uses textitpre-pruning and textitgradient-dropping to reduce the parameter space and improve scalability.
Our training paradigm introduces a tension between the rates of pre-pruning and gradient-dropping, privacy loss, and classification accuracy.
- Score: 9.120531252536617
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Scalability is a significant challenge when it comes to applying differential
privacy to training deep neural networks. The commonly used DP-SGD algorithm
struggles to maintain a high level of privacy protection while achieving high
accuracy on even moderately sized models. To tackle this challenge, we take
advantage of the fact that neural networks are overparameterized, which allows
us to improve neural network training with differential privacy. Specifically,
we introduce a new training paradigm that uses \textit{pre-pruning} and
\textit{gradient-dropping} to reduce the parameter space and improve
scalability. The process starts with pre-pruning the parameters of the original
network to obtain a smaller model that is then trained with DP-SGD. During
training, less important gradients are dropped, and only selected gradients are
updated. Our training paradigm introduces a tension between the rates of
pre-pruning and gradient-dropping, privacy loss, and classification accuracy.
Too much pre-pruning and gradient-dropping reduces the model's capacity and
worsens accuracy, while training a smaller model requires less privacy budget
for achieving good accuracy. We evaluate the interplay between these factors
and demonstrate the effectiveness of our training paradigm for both training
from scratch and fine-tuning pre-trained networks on several benchmark image
classification datasets. The tools can also be readily incorporated into
existing training paradigms.
Related papers
- Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation.
Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - Differential Privacy Meets Neural Network Pruning [10.77469946354744]
We study the interplay between neural network pruning and differential privacy, through the two modes of parameter updates.
Our experimental results demonstrate how decreasing the parameter space improves differentially private training.
By studying two popular forms of pruning which do not rely on gradients and do not incur an additional privacy loss, we show that random selection performs on par with magnitude-based selection.
arXiv Detail & Related papers (2023-03-08T14:27:35Z) - Equivariant Differentially Private Deep Learning: Why DP-SGD Needs
Sparser Models [7.49320945341034]
We show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements.
Our results are a step towards efficient model architectures that make optimal use of their parameters.
arXiv Detail & Related papers (2023-01-30T17:43:47Z) - Adversarial training with informed data selection [53.19381941131439]
Adrial training is the most efficient solution to defend the network against these malicious attacks.
This work proposes a data selection strategy to be applied in the mini-batch training.
The simulation results show that a good compromise can be obtained regarding robustness and standard accuracy.
arXiv Detail & Related papers (2023-01-07T12:09:50Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Wide Network Learning with Differential Privacy [7.453881927237143]
Current generation of neural networks suffers significant loss accuracy under most practically relevant privacy training regimes.
We develop a general approach towards training these models that takes advantage of the sparsity of the gradients of private Empirical Minimization (ERM)
Following the same number of parameters, we propose a novel algorithm for privately training neural networks.
arXiv Detail & Related papers (2021-03-01T20:31:50Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.