Related papers: Improving Differentially Private SGD via Randomly Sparsified Gradients

Improving Differentially Private SGD via Randomly Sparsified Gradients

URL: http://arxiv.org/abs/2112.00845v3
Date: Wed, 28 Jun 2023 13:30:48 GMT
Title: Improving Differentially Private SGD via Randomly Sparsified Gradients
Authors: Junyi Zhu, Matthew B. Blaschko
Abstract summary: Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression. We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
Score: 31.295035726077366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy, which requires gradient clipping to bound the maximum norm of individual gradients and additive isotropic Gaussian noise. With analysis of the convergence rate of DP-SGD in a non-convex setting, we identify that randomly sparsifying gradients before clipping and noisification adjusts a trade-off between internal components of the convergence bound and leads to a smaller upper bound when the noise is dominant. Additionally, our theoretical analysis and empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping eliminates the trade-off in the bound. This observation is indicative, as it implies DP-SGD has special inherent room for (even simply random) gradient compression. To verify the observation and utilize it, we propose an efficient and lightweight extension using random sparsification (RS) to strengthen DP-SGD. Experiments with various DP-SGD frameworks show that RS can improve performance. Additionally, the produced sparse gradients of RS exhibit advantages in reducing communication cost and strengthening privacy against reconstruction attacks, which are also key problems in private machine learning.

Related papers

Statistical Inference for Differentially Private Stochastic Gradient Descent [14.360996967498002]
This paper bridges the gap between existing statistical methods and Differentially Private Gradient Descent (DP-SGD)<n>For the output of DP-SGD, we show that the variance decomposes into statistical, sampling, and privacy-induced components.<n>Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method.
arXiv Detail & Related papers (2025-07-28T06:45:15Z)
On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization. This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD) We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z)
Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC) This approach replaces traditional clipping with non-monotonous adaptive gradient scaling. Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z)
Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise. Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise. We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z)
Rethinking Improved Privacy-Utility Trade-off with Pre-existing Knowledge for DP Training [31.559864332056648]
We propose a generic differential privacy framework with heterogeneous noise (DP-Hero) Atop DP-Hero, we instantiate a heterogeneous version of DP-SGD, where the noise injected into gradient updates is heterogeneous and guided by prior-established model parameters. We conduct comprehensive experiments to verify and explain the effectiveness of the proposed DP-Hero, showing improved training accuracy compared with state-of-the-art works.
arXiv Detail & Related papers (2024-09-05T08:40:54Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z)
Differentially Private SGDA for Minimax Problems [83.57322009102973]
We prove that gradient descent ascent (SGDA) can achieve optimal utility in terms of weak primal-dual population risk. This is the first-ever-known result for non-smoothly-strongly-concave setting.
arXiv Detail & Related papers (2022-01-22T13:05:39Z)
Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM) Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z)
Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)
Differentially Private Variational Autoencoders with Term-wise Gradient Aggregation [12.880889651679094]
We study how to learn variational autoencoders with a variety of divergences under differential privacy constraints. We propose term-wise DP-SGD that crafts randomized gradients in two different ways tailored to the compositions of the loss terms.
arXiv Detail & Related papers (2020-06-19T16:12:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.