Efficient Private SCO for Heavy-Tailed Data via Clipping
- URL: http://arxiv.org/abs/2206.13011v1
- Date: Mon, 27 Jun 2022 01:39:15 GMT
- Title: Efficient Private SCO for Heavy-Tailed Data via Clipping
- Authors: Chenhan Jin, Kaiwen Zhou, Bo Han, James Cheng, Ming-Chang Yang
- Abstract summary: We consider convex optimization for heavy-tailed data with the guarantee of gradients of being differentially private (DP)
For general convex problems, we derive excess population risks $TildeOleft(fracd1/7sqrtlnfrac(n epsilon)2beta d(nepsilon)2/7right)$ and $TildeOleft(fracd1/7lnfrac(nepsilon)2beta d(nepsilon)2
- Score: 31.37972684061572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider stochastic convex optimization for heavy-tailed data with the
guarantee of being differentially private (DP). Prior work on this problem is
restricted to the gradient descent (GD) method, which is inefficient for
large-scale problems. In this paper, we resolve this issue and derive the first
high-probability bounds for private stochastic method with clipping. For
general convex problems, we derive excess population risks
$\Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta
d}}}{(n\epsilon)^{2/7}}\right)$ and
$\Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta
d}}{(n\epsilon)^{2/7}}\right)$ under bounded or unbounded domain assumption,
respectively (here $n$ is the sample size, $d$ is the dimension of the data,
$\beta$ is the confidence level and $\epsilon$ is the private level). Then, we
extend our analysis to the strongly convex case and non-smooth case (which
works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous
gradients). We establish new excess risk bounds without bounded domain
assumption. The results above achieve lower excess risks and gradient
complexities than existing methods in their corresponding cases. Numerical
experiments are conducted to justify the theoretical improvement.
Related papers
- The Cost of Shuffling in Private Gradient Based Optimization [40.31928071333575]
We show that data shuffling results in worse empirical excess risk for textitDP-ShuffleG compared to DP-SGD.
We propose textitInterleaved-ShuffleG, a hybrid approach that integrates public data samples in private optimization.
arXiv Detail & Related papers (2025-02-05T22:30:00Z) - On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)
We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Dimension Independent Generalization of DP-SGD for Overparameterized
Smooth Convex Optimization [24.644583626705742]
This paper considers the generalization performance of differentially private convex learning.
We demonstrate that the convergence analysis of Langevin algorithms can be used to obtain new generalization bounds with differential privacy guarantees for DP-SGD.
arXiv Detail & Related papers (2022-06-03T22:03:05Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.