Faster Perturbed Stochastic Gradient Methods for Finding Local Minima
- URL: http://arxiv.org/abs/2110.13144v1
- Date: Mon, 25 Oct 2021 07:20:05 GMT
- Title: Faster Perturbed Stochastic Gradient Methods for Finding Local Minima
- Authors: Zixiang Chen and Dongruo Zhou and Quanquan Gu
- Abstract summary: We propose tttPullback, a faster perturbed gradient framework for finding local minima.
We show that Pullback with gradient estimators such as SARAH/SP and STORM can find $(epsilon, epsilon_H)$approximate local minima within $tilde O(epsilon-3 + H-6)$.
The core idea of our framework is a step-size pullback'' scheme to control the average movement of the gradient evaluations.
- Score: 92.99933928528797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Escaping from saddle points and finding local minima is a central problem in
nonconvex optimization. Perturbed gradient methods are perhaps the simplest
approach for this problem. However, to find $(\epsilon,
\sqrt{\epsilon})$-approximate local minima, the existing best stochastic
gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$,
which is not optimal. In this paper, we propose \texttt{Pullback}, a faster
perturbed stochastic gradient framework for finding local minima. We show that
Pullback with stochastic gradient estimators such as SARAH/SPIDER and STORM can
find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde
O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or
$\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea
of our framework is a step-size ``pullback'' scheme to control the average
movement of the iterates, which leads to faster convergence to the local
minima. Experiments on matrix factorization problems corroborate our theory.
Related papers
- Projection-Free Methods for Stochastic Simple Bilevel Optimization with
Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization.
We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z) - Estimating the minimizer and the minimum value of a regression function
under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$.
We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Saddle Point Optimization with Approximate Minimization Oracle [8.680676599607125]
A major approach to saddle point optimization $min_xmax_y f(x, y)$ is a gradient based approach as is popularized by generative adversarial networks (GANs)
In contrast, we analyze an alternative approach relying only on an oracle that solves a minimization problem approximately.
Our approach locates approximate solutions $x'$ and $y'$ to $min_x'f(x', y)$ at a given point $(x, y)$ and updates $(x, y)$ toward these approximate solutions $(x', y'
arXiv Detail & Related papers (2021-03-29T23:03:24Z) - Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to
Minimax Optimization [133.53164856723782]
We propose a new accelerated zeroth-order momentum (AccZOM) method for black-box mini-optimization where only function values can be obtained.
Meanwhile, we propose an accelerated zeroth-order momentum descent (Acc-MDA) method for black-box minimax optimization, where only function values can be obtained.
In particular, our Acc-MDA can obtain a lower gradient complexity of $tildeO(kappa_y2.5epsilon-3)$ with a batch size $O(kappa_y4)$.
arXiv Detail & Related papers (2020-08-18T22:19:29Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Stochastic Recursive Gradient Descent Ascent for Stochastic
Nonconvex-Strongly-Concave Minimax Problems [36.645753881826955]
In this paper, we propose a novel method called RecurEnti Ascent (SREDA), which estimates more efficiently using variance.
This method achieves the best known for this problem.
arXiv Detail & Related papers (2020-01-11T09:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.