Related papers: DP-FP: Differentially Private Forward Propagation for Large Models

DP-FP: Differentially Private Forward Propagation for Large Models

URL: http://arxiv.org/abs/2112.14430v1
Date: Wed, 29 Dec 2021 07:32:29 GMT
Title: DP-FP: Differentially Private Forward Propagation for Large Models
Authors: Jian Du and Haitao Mi
Abstract summary: We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP) Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
Score: 2.062295244789704
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When applied to large-scale learning problems, the conventional wisdom on privacy-preserving deep learning, known as Differential Private Stochastic Gradient Descent (DP-SGD), has met with limited success due to significant performance degradation and high memory overhead when compared to the non-privacy counterpart. We show how to mitigate the performance drop by replacing the DP-SGD with a novel DP Forward-Propagation (DP-FP) followed by an off-the-shelf non-DP optimizer. Our DP-FP employs novel (1) representation clipping followed by noise addition in the forward propagation stage, as well as (2) micro-batch construction via subsampling to achieve DP amplification and reduce noise power to $1/M$, where $M$ is the number of micro-batch in a step. When training a classification model, our DP-FP with all of the privacy-preserving operations on the representation is innately free of gradient bias, total noise proportionally to model size, and memory issues in DP-SGD. As a result, our DP-FP outperforms cutting-edge DP-SGD while retaining the same level of privacy, and it approaches non-private baselines and significantly outperforms state-of-the-art DP-SGD variants. When applied to RoBERTa-large on four downstream tasks, for example, DP-FP achieves an average accuracy of 91.34\% with privacy budgets less than 3, representing a 3.81\% performance improvement over the state-of-the-art DP-SGD and only a 0.9\% loss compared to the non-private baseline but with a significantly lower privacy leakage risk.

Related papers

Memory-Efficient Differentially Private Training with Gradient Random Projection [23.309769734156383]
Differential privacy (DP) protects sensitive data during neural network training.<n>Standard methods like DP-Adam suffer from high memory overhead due to per-sample gradient clipping.<n>We introduce DP-GRAPE (Gradient RAndom ProjEction), a DP training method that significantly reduces memory usage.
arXiv Detail & Related papers (2025-06-18T16:05:09Z)
Technical Report: Full Version of Analyzing and Optimizing Perturbation of DP-SGD Geometrically [7.905629859216635]
We first generalize DP-SGD and theoretically derive the impact of DP noise on the training process. Our analysis reveals that, in terms of a perturbed gradient, only the noise on direction has eminent impact on the model efficiency. We design a geometric strategy GeoDP within the DP framework, which perturbs the direction and the magnitude of a gradient.
arXiv Detail & Related papers (2025-04-08T02:26:10Z)
DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction [47.65999101635902]
Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from trained machine learning models. We develop a new component, called DOPPLER, which works by effectively amplifying the gradient while DP noise within this frequency domain. Our experiments show that the proposed DPs with a lowpass filter outperform their counterparts without the filter by 3%-10% in test accuracy.
arXiv Detail & Related papers (2024-08-24T04:27:07Z)
Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models. DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage. We develop a novel DP continual pre-training strategy using only 10% of public data. Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
DP-SGD-Global-Adapt-V2-S: Triad Improvements of Privacy, Accuracy and Fairness via Step Decay Noise Multiplier and Step Decay Upper Clipping Threshold [1.8741812670237725]
DPSGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We develop the DP-SGD-Global-Adapt-V2-S, which has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise. It improves accuracy by 0.9795%, 0.6786%, and 4.0130% in MNIST, CIFAR10, and CIFAR100, respectively.
arXiv Detail & Related papers (2023-12-05T00:09:57Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
Sparsity-Preserving Differentially Private Training of Large Embedding Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent. Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z)
DP-SGD for non-decomposable objective functions [0.0]
We develop a new variant for similarity based loss functions that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. Our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
arXiv Detail & Related papers (2023-10-04T18:48:16Z)
DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training. We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Dynamic Differential-Privacy Preserving SGD [19.273542515320372]
Differentially-Private Gradient Descent (DP-SGD) prevents training-data privacy breaches by adding noise to the clipped gradient during SGD training. The same clipping operation and additive noise across training steps results in unstable updates and even a ramp-up period. We propose the dynamic DP-SGD, which has a lower privacy cost than the DP-SGD during updates until they achieve the same target privacy budget.
arXiv Detail & Related papers (2021-10-30T04:45:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.