Related papers: Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

URL: http://arxiv.org/abs/2008.06570v2
Date: Sat, 30 Jan 2021 23:34:27 GMT
Title: Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces
Authors: Peter Kairouz, M\'onica Ribero, Keith Rush, Abhradeep Thakurta
Abstract summary: We show that noisy AdaGrad achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We operate with general convex functions in both constrained and unconstrained minimization.
Score: 24.52697154861819
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the problem of empirical risk minimziation (ERM) with differential privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions on the subspace from which gradients can be drawn, achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We show a convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of the gradient subspace. Since $\text{Tr}(G_T)=O(\sqrt{T})$ we can obtain faster rates for convex and Lipschitz functions, compared to the $O(1/\sqrt{T})$ rate achieved by known versions of noisy (stochastic) gradient descent with comparable noise variance. In particular, we show that if the gradients lie in a known constant rank subspace, and assuming algorithmic access to an envelope which bounds decaying sensitivity, one can achieve faster convergence to an excess empirical risk of $\tilde O(1/\epsilon n)$, where $\epsilon$ is the privacy budget and $n$ the number of samples. Letting $p$ be the problem dimension, this result implies that, by running noisy Adagrad, we can bypass the DP-SGD bound $\tilde O(\sqrt{p}/\epsilon n)$ in $T=(\epsilon n)^{2/(1+2\alpha)}$ iterations, where $\alpha \geq 0$ is a parameter controlling gradient norm decay, instead of the rate achieved by SGD of $T=\epsilon^2n^2$. Our results operate with general convex functions in both constrained and unconstrained minimization. Along the way, we do a perturbation analysis of noisy AdaGrad of independent interest. Our utility guarantee for the private ERM problem follows as a corollary to the regret guarantee of noisy AdaGrad.

Related papers

Can SGD Handle Heavy-Tailed Noise? [6.111519084375339]
Gradient Descent (SGD) is a machine learning project of large-scale optimization, yet its theoretical behavior under heavy-tailed noise is poorly understood.<n>We rigorously investigate whether SGD, can provably succeed under such adverse conditions.
arXiv Detail & Related papers (2025-08-06T20:09:41Z)
Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD [16.019880089338383]
We show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $sqrtfracmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsf
arXiv Detail & Related papers (2024-10-26T10:14:17Z)
Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems [6.431793114484429]
We propose $sqrtlog(d)/sqrtn + log(d)/[nvarepsilon]2/5$ to solve the problem of differentially-private saddle-points in the polyhedral setting. We show that our algorithms attain a rate of $sqrtlog(d)/sqrtn + log(d)/[nvarepsilon]2/5$ with constant success.
arXiv Detail & Related papers (2024-03-05T12:28:00Z)
ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency. In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! . We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z)
Generalization Bounds for Gradient Methods via Discrete and Continuous Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD) We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z)
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems. We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting. To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z)
Cascading Bandit under Differential Privacy [21.936577816668944]
We study emphdifferential privacy (DP) and emphlocal differential privacy (LDP) in cascading bandits. Under DP, we propose an algorithm which guarantees $epsilon$-indistinguishability and a regret of $mathcalO(fraclog3 Tepsilon)$ for an arbitrarily small $xi$. Under ($epsilon$,$delta$)-LDP, we relax the $K2$ dependence through the tradeoff between privacy budgetepsilon$ and error probability $
arXiv Detail & Related papers (2021-05-24T07:19:01Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Differentially Private SGD with Non-Smooth Loss [26.212935426509908]
Loss function is relaxed to have $alpha$-H"older continuous gradient. We prove that noisy SGD with $alpha$-H"older smooth losses using gradient perturbation can guarantee $(epsilon,delta)$-differential privacy.
arXiv Detail & Related papers (2021-01-22T03:19:06Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging. We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.