Fast Dimension Independent Private AdaGrad on Publicly Estimated
Subspaces
- URL: http://arxiv.org/abs/2008.06570v2
- Date: Sat, 30 Jan 2021 23:34:27 GMT
- Title: Fast Dimension Independent Private AdaGrad on Publicly Estimated
Subspaces
- Authors: Peter Kairouz, M\'onica Ribero, Keith Rush, Abhradeep Thakurta
- Abstract summary: We show that noisy AdaGrad achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise.
We operate with general convex functions in both constrained and unconstrained minimization.
- Score: 24.52697154861819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the problem of empirical risk minimziation (ERM) with differential
privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions
on the subspace from which gradients can be drawn, achieves a regret comparable
to traditional AdaGrad plus a well-controlled term due to noise. We show a
convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of
the gradient subspace. Since $\text{Tr}(G_T)=O(\sqrt{T})$ we can obtain faster
rates for convex and Lipschitz functions, compared to the $O(1/\sqrt{T})$ rate
achieved by known versions of noisy (stochastic) gradient descent with
comparable noise variance. In particular, we show that if the gradients lie in
a known constant rank subspace, and assuming algorithmic access to an envelope
which bounds decaying sensitivity, one can achieve faster convergence to an
excess empirical risk of $\tilde O(1/\epsilon n)$, where $\epsilon$ is the
privacy budget and $n$ the number of samples. Letting $p$ be the problem
dimension, this result implies that, by running noisy Adagrad, we can bypass
the DP-SGD bound $\tilde O(\sqrt{p}/\epsilon n)$ in $T=(\epsilon
n)^{2/(1+2\alpha)}$ iterations, where $\alpha \geq 0$ is a parameter
controlling gradient norm decay, instead of the rate achieved by SGD of
$T=\epsilon^2n^2$. Our results operate with general convex functions in both
constrained and unconstrained minimization.
Along the way, we do a perturbation analysis of noisy AdaGrad of independent
interest. Our utility guarantee for the private ERM problem follows as a
corollary to the regret guarantee of noisy AdaGrad.
Related papers
- Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD [16.019880089338383]
We show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $sqrtfracmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsf
arXiv Detail & Related papers (2024-10-26T10:14:17Z) - Mirror Descent Algorithms with Nearly Dimension-Independent Rates for
Differentially-Private Stochastic Saddle-Point Problems [6.431793114484429]
We propose $sqrtlog(d)/sqrtn + log(d)/[nvarepsilon]2/5$ to solve the problem of differentially-private saddle-points in the polyhedral setting.
We show that our algorithms attain a rate of $sqrtlog(d)/sqrtn + log(d)/[nvarepsilon]2/5$ with constant success.
arXiv Detail & Related papers (2024-03-05T12:28:00Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Generalization Bounds for Gradient Methods via Discrete and Continuous
Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD)
We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Cascading Bandit under Differential Privacy [21.936577816668944]
We study emphdifferential privacy (DP) and emphlocal differential privacy (LDP) in cascading bandits.
Under DP, we propose an algorithm which guarantees $epsilon$-indistinguishability and a regret of $mathcalO(fraclog3 Tepsilon)$ for an arbitrarily small $xi$.
Under ($epsilon$,$delta$)-LDP, we relax the $K2$ dependence through the tradeoff between privacy budgetepsilon$ and error probability $
arXiv Detail & Related papers (2021-05-24T07:19:01Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Differentially Private SGD with Non-Smooth Loss [26.212935426509908]
Loss function is relaxed to have $alpha$-H"older continuous gradient.
We prove that noisy SGD with $alpha$-H"older smooth losses using gradient perturbation can guarantee $(epsilon,delta)$-differential privacy.
arXiv Detail & Related papers (2021-01-22T03:19:06Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.