Related papers: Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

URL: http://arxiv.org/abs/2007.03813v2
Date: Fri, 23 Apr 2021 23:07:28 GMT
Title: Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification
Authors: Yingxue Zhou, Zhiwei Steven Wu, Arindam Banerjee
Abstract summary: Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM) Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
Score: 47.23063195722975
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for over-parameterized models where $p \gg n$, the number of training samples. Existing lower bounds on private ERM show that such dependence on $p$ is inevitable in the worst case. In this paper, we circumvent the dependence on the ambient dimension by leveraging a low-dimensional structure of gradient space in deep networks -- that is, the stochastic gradients for deep nets usually stay in a low dimensional subspace in the training process. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace, which is given by the top gradient eigenspace on a small public dataset. We provide a general sample complexity analysis on the public dataset for the gradient subspace identification problem and demonstrate that under certain low-dimensional assumptions the public sample complexity only grows logarithmically in $p$. Finally, we provide a theoretical analysis and empirical evaluations to show that our method can substantially improve the accuracy of DP-SGD in the high privacy regime (corresponding to low privacy loss $\epsilon$).

Related papers

The Cost of Shuffling in Private Gradient Based Optimization [40.31928071333575]
We show that data shuffling results in worse empirical excess risk for textitDP-ShuffleG compared to DP-SGD. We propose textitInterleaved-ShuffleG, a hybrid approach that integrates public data samples in private optimization.
arXiv Detail & Related papers (2025-02-05T22:30:00Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z)
High-Dimensional Private Empirical Risk Minimization by Greedy Coordinate Descent [11.49109939095326]
We study differentially private empirical risk minimization (DP-ERM) We show theoretically that DP-GCD can achieve a logarithmic dependence on the dimension for a wide range of problems.
arXiv Detail & Related papers (2022-07-04T16:27:00Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Deep learning, stochastic gradient descent and diffusion maps [0.0]
gradient descent (SGD) is widely used in deep learning due to its computational efficiency. It has been observed that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep networks are close to zero. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold.
arXiv Detail & Related papers (2022-04-04T10:19:39Z)
Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression. We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z)
Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters. We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z)
Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.