Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails
- URL: http://arxiv.org/abs/2405.17529v1
- Date: Mon, 27 May 2024 16:30:11 GMT
- Title: Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails
- Authors: Haichao Sha, Yang Cao, Yong Liu, Yuncheng Wu, Ruixuan Liu, Hong Chen,
- Abstract summary: Differentially Private Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning.
DPSGD clips the gradients to a norm and then injects a calibrated noise into the training procedure.
We propose a novel approach, Discriminative(DC)-DPSGD, with two key iterations.
- Score: 20.432871178766927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clipping~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clipping mechanism that applies different clipping thresholds for body and tail gradients to reduce the clipping loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log^{\max(0,\theta-1)}(T/\delta)\log^{2\theta}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $\theta\geq 1/2$, iterations $T$, and arbitrary probability $\delta$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy.
Related papers
- Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates [15.27596975662702]
We explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients.
Our results match the minimax lower bound in citekamath2022, indicating that the theoretical limit of convex optimization under DP is achievable.
arXiv Detail & Related papers (2024-08-19T11:07:05Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Clip21: Error Feedback for Gradient Clipping [8.979288425347702]
We design Clip21 -- the first provably effective and practically useful feedback mechanism for distributed methods.
Our method converges faster in practice than competing methods.
arXiv Detail & Related papers (2023-05-30T10:41:42Z) - EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections
for Federated Learning with Heterogeneous Data [9.379890125442333]
Gradient clipping is an important technique for deep neural networks with exploding gradients, such as recurrent neural networks.
Recent datasets have shown that the loss functions do not satisfy the conventional smoothness condition, but satisfy a relaxed linear condition, i.e. relaxed smoothness.
EPISODE resamples from each client and obtains the global gradient, is used to determine whether to apply gradient clipping for the entire client.
arXiv Detail & Related papers (2023-02-14T16:05:51Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Gradient Correction beyond Gradient Descent [63.33439072360198]
gradient correction is apparently the most crucial aspect for the training of a neural network.
We introduce a framework (textbfGCGD) to perform gradient correction.
Experiment results show that our gradient correction framework can effectively improve the gradient quality to reduce training epochs by $sim$ 20% and also improve the network performance.
arXiv Detail & Related papers (2022-03-16T01:42:25Z) - Large Scale Private Learning via Low-rank Reparametrization [77.38947817228656]
We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks.
We are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9%$ on four downstream tasks.
arXiv Detail & Related papers (2021-06-17T10:14:43Z) - Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.