DP-FP: Differentially Private Forward Propagation for Large Models
- URL: http://arxiv.org/abs/2112.14430v1
- Date: Wed, 29 Dec 2021 07:32:29 GMT
- Title: DP-FP: Differentially Private Forward Propagation for Large Models
- Authors: Jian Du and Haitao Mi
- Abstract summary: We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP)
Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
- Score: 2.062295244789704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When applied to large-scale learning problems, the conventional wisdom on
privacy-preserving deep learning, known as Differential Private Stochastic
Gradient Descent (DP-SGD), has met with limited success due to significant
performance degradation and high memory overhead when compared to the
non-privacy counterpart. We show how to mitigate the performance drop by
replacing the DP-SGD with a novel DP Forward-Propagation (DP-FP) followed by an
off-the-shelf non-DP optimizer. Our DP-FP employs novel (1) representation
clipping followed by noise addition in the forward propagation stage, as well
as (2) micro-batch construction via subsampling to achieve DP amplification and
reduce noise power to $1/M$, where $M$ is the number of micro-batch in a step.
When training a classification model, our DP-FP with all of the
privacy-preserving operations on the representation is innately free of
gradient bias, total noise proportionally to model size, and memory issues in
DP-SGD. As a result, our DP-FP outperforms cutting-edge DP-SGD while retaining
the same level of privacy, and it approaches non-private baselines and
significantly outperforms state-of-the-art DP-SGD variants. When applied to
RoBERTa-large on four downstream tasks, for example, DP-FP achieves an average
accuracy of 91.34\% with privacy budgets less than 3, representing a 3.81\%
performance improvement over the state-of-the-art DP-SGD and only a 0.9\% loss
compared to the non-private baseline but with a significantly lower privacy
leakage risk.
Related papers
- Pre-training Differentially Private Models with Limited Public Data [58.945400707033016]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We propose a novel DP continual pre-training strategy using only 10% of public data.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order
Optimization [54.24600476755372]
We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization.
We show that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $ (1,10-5)$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Conciliating Privacy and Utility in Data Releases via Individual Differential Privacy and Microaggregation [4.287502453001108]
$epsilon$-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees.
We propose $epsilon$-individual differential privacy (iDP), which causes less data distortion while providing the same protection as DP to subjects.
We report on experiments that show how our approach can provide strong privacy (small $epsilon$) while yielding protected data that do not significantly degrade the accuracy of secondary data analysis.
arXiv Detail & Related papers (2023-12-21T10:23:18Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - DP-SGD for non-decomposable objective functions [0.0]
We develop a new variant for similarity based loss functions that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$.
Our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
arXiv Detail & Related papers (2023-10-04T18:48:16Z) - Towards the Flatter Landscape and Better Generalization in Federated
Learning under Client-level Differential Privacy [67.33715954653098]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP.
Specifically, DP-FedSAM integrates Sharpness Aware of Minimization (SAM) to generate local flatness models with stability and weight robustness.
To further reduce the magnitude random noise while achieving better performance, we propose DP-FedSAM-$top_k$ by adopting the local update sparsification technique.
arXiv Detail & Related papers (2023-05-01T15:19:09Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with
Importance Sampling [19.59757201902467]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Dynamic Differential-Privacy Preserving SGD [19.273542515320372]
Differentially-Private Gradient Descent (DP-SGD) prevents training-data privacy breaches by adding noise to the clipped gradient during SGD training.
The same clipping operation and additive noise across training steps results in unstable updates and even a ramp-up period.
We propose the dynamic DP-SGD, which has a lower privacy cost than the DP-SGD during updates until they achieve the same target privacy budget.
arXiv Detail & Related papers (2021-10-30T04:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.