ReSQueing Parallel and Private Stochastic Convex Optimization
- URL: http://arxiv.org/abs/2301.00457v2
- Date: Fri, 27 Oct 2023 06:10:41 GMT
- Title: ReSQueing Parallel and Private Stochastic Convex Optimization
- Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu,
Aaron Sidford, Kevin Tian
- Abstract summary: We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
- Score: 59.53297063174519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new tool for stochastic convex optimization (SCO): a
Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function
convolved with a (Gaussian) probability density. Combining ReSQue with recent
advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop
algorithms achieving state-of-the-art complexities for SCO in parallel and
private settings. For a SCO objective constrained to the unit ball in
$\mathbb{R}^d$, we obtain the following results (up to polylogarithmic
factors). We give a parallel algorithm obtaining optimization error
$\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient
oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} +
\epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a
bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in
[d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of
[BJLLS19] while maintaining the optimal total work of stochastic gradient
descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19,
BFGT20, AFKT21, KLL21] established that if $n \gtrsim d
\epsilon_{\text{dp}}^{-2}$, $(\epsilon_{\text{dp}}, \delta)$-differential
privacy is attained at no asymptotic cost to the SCO utility. However, these
prior works all required a superlinear number of gradient queries. We close
this gap for sufficiently large $n \gtrsim d^2 \epsilon_{\text{dp}}^{-3}$, by
using ReSQue to design an algorithm with near-linear gradient query complexity
in this regime.
Related papers
- Quantum Algorithms for Non-smooth Non-convex Optimization [30.576546266390714]
This paper considers the problem for finding the $(,epsilon)$-Goldstein stationary point of Lipschitz continuous objective.
We construct a zeroth-order quantum estimator for the surrogate oracle function.
arXiv Detail & Related papers (2024-10-21T16:52:26Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample
Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication.
We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.