Bypassing the Ambient Dimension: Private SGD with Gradient Subspace
Identification
- URL: http://arxiv.org/abs/2007.03813v2
- Date: Fri, 23 Apr 2021 23:07:28 GMT
- Title: Bypassing the Ambient Dimension: Private SGD with Gradient Subspace
Identification
- Authors: Yingxue Zhou, Zhiwei Steven Wu, Arindam Banerjee
- Abstract summary: Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM)
Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model.
We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
- Score: 47.23063195722975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private SGD (DP-SGD) is one of the most popular methods for
solving differentially private empirical risk minimization (ERM). Due to its
noisy perturbation on each gradient update, the error rate of DP-SGD scales
with the ambient dimension $p$, the number of parameters in the model. Such
dependence can be problematic for over-parameterized models where $p \gg n$,
the number of training samples. Existing lower bounds on private ERM show that
such dependence on $p$ is inevitable in the worst case. In this paper, we
circumvent the dependence on the ambient dimension by leveraging a
low-dimensional structure of gradient space in deep networks -- that is, the
stochastic gradients for deep nets usually stay in a low dimensional subspace
in the training process. We propose Projected DP-SGD that performs noise
reduction by projecting the noisy gradients to a low-dimensional subspace,
which is given by the top gradient eigenspace on a small public dataset. We
provide a general sample complexity analysis on the public dataset for the
gradient subspace identification problem and demonstrate that under certain
low-dimensional assumptions the public sample complexity only grows
logarithmically in $p$. Finally, we provide a theoretical analysis and
empirical evaluations to show that our method can substantially improve the
accuracy of DP-SGD in the high privacy regime (corresponding to low privacy
loss $\epsilon$).
Related papers
- Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - High-Dimensional Private Empirical Risk Minimization by Greedy
Coordinate Descent [11.49109939095326]
We study differentially private empirical risk minimization (DP-ERM)
We show theoretically that DP-GCD can achieve a logarithmic dependence on the dimension for a wide range of problems.
arXiv Detail & Related papers (2022-07-04T16:27:00Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Deep learning, stochastic gradient descent and diffusion maps [0.0]
gradient descent (SGD) is widely used in deep learning due to its computational efficiency.
It has been observed that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep networks are close to zero.
Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold.
arXiv Detail & Related papers (2022-04-04T10:19:39Z) - Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression.
We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.