Improving Differentially Private SGD via Randomly Sparsified Gradients
- URL: http://arxiv.org/abs/2112.00845v3
- Date: Wed, 28 Jun 2023 13:30:48 GMT
- Title: Improving Differentially Private SGD via Randomly Sparsified Gradients
- Authors: Junyi Zhu, Matthew B. Blaschko
- Abstract summary: Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression.
We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
- Score: 31.295035726077366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private stochastic gradient descent (DP-SGD) has been widely
adopted in deep learning to provide rigorously defined privacy, which requires
gradient clipping to bound the maximum norm of individual gradients and
additive isotropic Gaussian noise. With analysis of the convergence rate of
DP-SGD in a non-convex setting, we identify that randomly sparsifying gradients
before clipping and noisification adjusts a trade-off between internal
components of the convergence bound and leads to a smaller upper bound when the
noise is dominant. Additionally, our theoretical analysis and empirical
evaluations show that the trade-off is not trivial but possibly a unique
property of DP-SGD, as either canceling noisification or gradient clipping
eliminates the trade-off in the bound. This observation is indicative, as it
implies DP-SGD has special inherent room for (even simply random) gradient
compression. To verify the observation and utilize it, we propose an efficient
and lightweight extension using random sparsification (RS) to strengthen
DP-SGD. Experiments with various DP-SGD frameworks show that RS can improve
performance. Additionally, the produced sparse gradients of RS exhibit
advantages in reducing communication cost and strengthening privacy against
reconstruction attacks, which are also key problems in private machine
learning.
Related papers
- Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC)
This approach replaces traditional clipping with non-monotonous adaptive gradient scaling.
Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z) - Rethinking Improved Privacy-Utility Trade-off with Pre-existing Knowledge for DP Training [31.559864332056648]
We propose a generic differential privacy framework with heterogeneous noise (DP-Hero)
Atop DP-Hero, we instantiate a heterogeneous version of DP-SGD, where the noise injected into gradient updates is heterogeneous and guided by prior-established model parameters.
We conduct comprehensive experiments to verify and explain the effectiveness of the proposed DP-Hero, showing improved training accuracy compared with state-of-the-art works.
arXiv Detail & Related papers (2024-09-05T08:40:54Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Differentially Private SGDA for Minimax Problems [83.57322009102973]
We prove that gradient descent ascent (SGDA) can achieve optimal utility in terms of weak primal-dual population risk.
This is the first-ever-known result for non-smoothly-strongly-concave setting.
arXiv Detail & Related papers (2022-01-22T13:05:39Z) - Bypassing the Ambient Dimension: Private SGD with Gradient Subspace
Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM)
Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model.
We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z) - Differentially Private Variational Autoencoders with Term-wise Gradient
Aggregation [12.880889651679094]
We study how to learn variational autoencoders with a variety of divergences under differential privacy constraints.
We propose term-wise DP-SGD that crafts randomized gradients in two different ways tailored to the compositions of the loss terms.
arXiv Detail & Related papers (2020-06-19T16:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.