DP-FP: Differentially Private Forward Propagation for Large Models
- URL: http://arxiv.org/abs/2112.14430v1
- Date: Wed, 29 Dec 2021 07:32:29 GMT
- Title: DP-FP: Differentially Private Forward Propagation for Large Models
- Authors: Jian Du and Haitao Mi
- Abstract summary: We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP)
Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
- Score: 2.062295244789704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When applied to large-scale learning problems, the conventional wisdom on
privacy-preserving deep learning, known as Differential Private Stochastic
Gradient Descent (DP-SGD), has met with limited success due to significant
performance degradation and high memory overhead when compared to the
non-privacy counterpart. We show how to mitigate the performance drop by
replacing the DP-SGD with a novel DP Forward-Propagation (DP-FP) followed by an
off-the-shelf non-DP optimizer. Our DP-FP employs novel (1) representation
clipping followed by noise addition in the forward propagation stage, as well
as (2) micro-batch construction via subsampling to achieve DP amplification and
reduce noise power to $1/M$, where $M$ is the number of micro-batch in a step.
When training a classification model, our DP-FP with all of the
privacy-preserving operations on the representation is innately free of
gradient bias, total noise proportionally to model size, and memory issues in
DP-SGD. As a result, our DP-FP outperforms cutting-edge DP-SGD while retaining
the same level of privacy, and it approaches non-private baselines and
significantly outperforms state-of-the-art DP-SGD variants. When applied to
RoBERTa-large on four downstream tasks, for example, DP-FP achieves an average
accuracy of 91.34\% with privacy budgets less than 3, representing a 3.81\%
performance improvement over the state-of-the-art DP-SGD and only a 0.9\% loss
compared to the non-private baseline but with a significantly lower privacy
leakage risk.
Related papers
- DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction [47.65999101635902]
Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from trained machine learning models.
We develop a new component, called DOPPLER, which works by effectively amplifying the gradient while DP noise within this frequency domain.
Our experiments show that the proposed DPs with a lowpass filter outperform their counterparts without the filter by 3%-10% in test accuracy.
arXiv Detail & Related papers (2024-08-24T04:27:07Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - DP-SGD for non-decomposable objective functions [0.0]
We develop a new variant for similarity based loss functions that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$.
Our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
arXiv Detail & Related papers (2023-10-04T18:48:16Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Dynamic Differential-Privacy Preserving SGD [19.273542515320372]
Differentially-Private Gradient Descent (DP-SGD) prevents training-data privacy breaches by adding noise to the clipped gradient during SGD training.
The same clipping operation and additive noise across training steps results in unstable updates and even a ramp-up period.
We propose the dynamic DP-SGD, which has a lower privacy cost than the DP-SGD during updates until they achieve the same target privacy budget.
arXiv Detail & Related papers (2021-10-30T04:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.