Faster Rates of Convergence to Stationary Points in Differentially
Private Optimization
- URL: http://arxiv.org/abs/2206.00846v2
- Date: Tue, 30 May 2023 20:45:31 GMT
- Title: Faster Rates of Convergence to Stationary Points in Differentially
Private Optimization
- Authors: Raman Arora, Raef Bassily, Tom\'as Gonz\'alez, Crist\'obal Guzm\'an,
Michael Menart, Enayat Ullah
- Abstract summary: We study the problem of approximating stationary points of Lipschitz and smooth functions under $(varepsilon,delta)$-differential privacy (DP)
A point $widehatw$ is called an $alpha$-stationary point of a function $mathbbRdrightarrowmathbbR$ if $|nabla F(widehatw)|leq alpha$.
We provide a new efficient algorithm that finds an $tildeObig(big[
- Score: 31.46358820048179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of approximating stationary points of Lipschitz and
smooth functions under $(\varepsilon,\delta)$-differential privacy (DP) in both
the finite-sum and stochastic settings. A point $\widehat{w}$ is called an
$\alpha$-stationary point of a function $F:\mathbb{R}^d\rightarrow\mathbb{R}$
if $\|\nabla F(\widehat{w})\|\leq \alpha$. We provide a new efficient algorithm
that finds an
$\tilde{O}\big(\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{2/3}\big)$-stationary
point in the finite-sum setting, where $n$ is the number of samples. This
improves on the previous best rate of
$\tilde{O}\big(\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{1/2}\big)$. We also
give a new construction that improves over the existing rates in the stochastic
optimization setting, where the goal is to find approximate stationary points
of the population risk. Our construction finds a
$\tilde{O}\big(\frac{1}{n^{1/3}} +
\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{1/2}\big)$-stationary point of the
population risk in time linear in $n$. Furthermore, under the additional
assumption of convexity, we completely characterize the sample complexity of
finding stationary points of the population risk (up to polylog factors) and
show that the optimal rate on population stationarity is $\tilde
\Theta\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\varepsilon}\big)$. Finally, we
show that our methods can be used to provide dimension-independent rates of
$O\big(\frac{1}{\sqrt{n}}+\min\big(\big[\frac{\sqrt{rank}}{n\varepsilon}\big]^{2/3},\frac{1}{(n\varepsilon)^{2/5}}\big)\big)$
on population stationarity for Generalized Linear Models (GLM), where $rank$ is
the rank of the design matrix, which improves upon the previous best known
rate.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates [33.46648653842542]
We study private empirical risk problem for losses satisfying the $(gamma,kappa)$-Kurdyka-Lojasiewicz (KL) condition.
We show it is possible to achieve the rate $tildeObig(big(fracsqrtdnsqrtrhobig)kappabig)$ with a private implementation of the proximal point method.
arXiv Detail & Related papers (2023-11-22T15:12:42Z) - Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function
Approximation: Minimax Optimal and Instance-Dependent Regret Bounds [26.277745106128197]
In this work, we address the challenge of such rewards in reinforcement learning with linear function approximation.
We first design an algorithm, textscHeavy-OFUL, for heavy-tailed linear bandits, achieving an emphinstance-dependent $T$-round regret of $tildeObig.
Our result is achieved via a novel self-normalized concentration inequality that may be of independent interest in handling heavytailed noise in general online regression problems.
arXiv Detail & Related papers (2023-06-12T02:56:09Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Faster Rates of Differentially Private Stochastic Convex Optimization [7.93728520583825]
We study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $theta>1$.
In the second part, we focus on a special case where the population risk function is strongly convex.
arXiv Detail & Related papers (2021-07-31T22:23:39Z) - Differentially Private Stochastic Optimization: New Results in Convex
and Non-Convex Settings [15.122617358656539]
We present differentially private algorithms for convex and non-smooth general losses (GLLs)
For the convex case, we focus on the family of non-smooth generalized losses (GLLs)
arXiv Detail & Related papers (2021-07-12T17:06:08Z) - Optimal Approximation Rate of ReLU Networks in terms of Width and Depth [5.37133760455631]
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth.
It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
arXiv Detail & Related papers (2021-02-28T13:15:55Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.