ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method
- URL: http://arxiv.org/abs/2103.11333v1
- Date: Sun, 21 Mar 2021 08:14:40 GMT
- Title: ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method
- Authors: Zhize Li
- Abstract summary: We propose a novel accelerated variance-reduced method called ANITA for finite-sum optimization.
In the general convex setting, ANITA achieves the convergence result $Obig(nminbig1+logfrac1epsilonsqrtn).
In the strongly convex setting, we also show that ANITA can achieve the optimal convergence result $OBig(big(n+sqrtfracnLmubig)logfracnLmubig)$ matching the lower bound $
- Score: 17.259824817932294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel accelerated variance-reduced gradient method called ANITA
for finite-sum optimization. In this paper, we consider both general convex and
strongly convex settings. In the general convex setting, ANITA achieves the
convergence result $O\big(n\min\big\{1+\log\frac{1}{\epsilon\sqrt{n}},
\log\sqrt{n}\big\} + \sqrt{\frac{nL}{\epsilon}} \big)$, which improves the
previous best result $O\big(n\min\{\log\frac{1}{\epsilon}, \log
n\}+\sqrt{\frac{nL}{\epsilon}}\big)$ given by Varag (Lan et al., 2019). In
particular, for a very wide range of $\epsilon$, i.e., $\epsilon \in
(0,\frac{L}{n\log^2\sqrt{n}}]\cup [\frac{1}{\sqrt{n}},+\infty)$, where
$\epsilon$ is the error tolerance $f(x_T)-f^*\leq \epsilon$ and $n$ is the
number of data samples, ANITA can achieve the optimal convergence result
$O\big(n+\sqrt{\frac{nL}{\epsilon}}\big)$ matching the lower bound
$\Omega\big(n+\sqrt{\frac{nL}{\epsilon}}\big)$ provided by Woodworth and Srebro
(2016). To the best of our knowledge, ANITA is the \emph{first} accelerated
algorithm which can \emph{exactly} achieve this optimal result
$O\big(n+\sqrt{\frac{nL}{\epsilon}}\big)$ for general convex finite-sum
problems. In the strongly convex setting, we also show that ANITA can achieve
the optimal convergence result
$O\Big(\big(n+\sqrt{\frac{nL}{\mu}}\big)\log\frac{1}{\epsilon}\Big)$ matching
the lower bound
$\Omega\Big(\big(n+\sqrt{\frac{nL}{\mu}}\big)\log\frac{1}{\epsilon}\Big)$
provided by Lan and Zhou (2015). Moreover, ANITA enjoys a simpler loopless
algorithmic structure unlike previous accelerated algorithms such as Katyusha
(Allen-Zhu, 2017) and Varag (Lan et al., 2019) where they use an inconvenient
double-loop structure. Finally, the experimental results also show that ANITA
converges faster than previous state-of-the-art Varag (Lan et al., 2019),
validating our theoretical results and confirming the practical superiority of
ANITA.
Related papers
- A note on quantum lower bounds for local search via congestion and expansion [4.68073705539907]
We show that the quantum query complexity of local search on $G$ is $Omegabigl( fracnfrac34sqrtg bigr)$.
In contrast to the classical setting, a gap remains in the quantum case between our lower bound and the best-known upper bound of $Obigl( nfrac13 bigr)$ for such graphs.
arXiv Detail & Related papers (2024-12-17T21:42:42Z) - Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - Faster Stochastic Algorithms for Minimax Optimization under
Polyak--{\L}ojasiewicz Conditions [12.459354707528819]
We propose SPIDER-GDA for solving the finite-sum problem of the form $min_x max_y f(x,y)triqangle frac1n sum_i=1n f_i(x,y)$.
We prove SPIDER-GDA could find an $epsilon$-optimal solution within $mathcal Oleft((n + sqrtn,kappa_xkappa_y2)log (1/epsilon)
arXiv Detail & Related papers (2023-07-29T02:26:31Z) - Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function
Approximation: Minimax Optimal and Instance-Dependent Regret Bounds [26.277745106128197]
In this work, we address the challenge of such rewards in reinforcement learning with linear function approximation.
We first design an algorithm, textscHeavy-OFUL, for heavy-tailed linear bandits, achieving an emphinstance-dependent $T$-round regret of $tildeObig.
Our result is achieved via a novel self-normalized concentration inequality that may be of independent interest in handling heavytailed noise in general online regression problems.
arXiv Detail & Related papers (2023-06-12T02:56:09Z) - Faster Rates of Convergence to Stationary Points in Differentially
Private Optimization [31.46358820048179]
We study the problem of approximating stationary points of Lipschitz and smooth functions under $(varepsilon,delta)$-differential privacy (DP)
A point $widehatw$ is called an $alpha$-stationary point of a function $mathbbRdrightarrowmathbbR$ if $|nabla F(widehatw)|leq alpha$.
We provide a new efficient algorithm that finds an $tildeObig(big[
arXiv Detail & Related papers (2022-06-02T02:43:44Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
We prove that for every decision tree, the absolute values of the Fourier coefficients of a given order $ellsqrtbinomdell (1+log n)ell-1,$ sum to at most $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant.
arXiv Detail & Related papers (2020-08-24T06:50:57Z) - Acceleration for Compressed Gradient Descent in Distributed and
Federated Optimization [31.066505042748826]
We propose the first accelerated compressed gradient descent (ACGD) methods.
We show that ACGD enjoys the rate $OBig( (1+omega)sqrtfracLmulog frac1epsilonBig)$ for $mu$-strongly convex problems.
We also propose a distributed variant of ACGD (called ADIANA) and prove the convergence rate $widetildeOBig(+sqrtfracLmu+sqrtbig)$.
arXiv Detail & Related papers (2020-02-26T09:03:23Z) - Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
We give a sharp complexity analysis for EXTRA with the improved $Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$.
Our communication complexities of the accelerated EXTRA are only worse by the factors of $left(logfracLmu (1-sigma_2(W))right)$ and $left(logfrac1epsilon (1-
arXiv Detail & Related papers (2020-02-24T08:07:08Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.