Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex
Optimization
- URL: http://arxiv.org/abs/2208.10025v1
- Date: Mon, 22 Aug 2022 02:40:35 GMT
- Title: Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex
Optimization
- Authors: Zhize Li, Jian Li
- Abstract summary: We propose and analyze several gradient algorithms for finding stationary points or local minimum in nonsmooth regularizer, finite-sum and online optimization problems.
First, we propose a simple proximal optimal gradient algorithm based on a variance of reduction called ProxSVRG+.
We show that our algorithm is almost as simple as its counterpart, and achieves similar optimal rates.
- Score: 23.447011255046835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and analyze several stochastic gradient algorithms for finding
stationary points or local minimum in nonconvex, possibly with nonsmooth
regularizer, finite-sum and online optimization problems. First, we propose a
simple proximal stochastic gradient algorithm based on variance reduction
called ProxSVRG+. We provide a clean and tight analysis of ProxSVRG+, which
shows that it outperforms the deterministic proximal gradient descent (ProxGD)
for a wide range of minibatch sizes, hence solves an open problem proposed in
Reddi et al. (2016b). Also, ProxSVRG+ uses much less proximal oracle calls than
ProxSVRG (Reddi et al., 2016b) and extends to the online setting by avoiding
full gradient computations. Then, we further propose an optimal algorithm,
called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further
improves the gradient complexity of ProxSVRG+ and achieves the optimal upper
bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021).
Moreover, we show that both ProxSVRG+ and SSRGD enjoy automatic adaptation with
local structure of the objective function such as the Polyak-\L{}ojasiewicz
(PL) condition for nonconvex functions in the finite-sum case, i.e., we prove
that both of them can automatically switch to faster global linear convergence
without any restart performed in prior work ProxSVRG (Reddi et al., 2016b).
Finally, we focus on the more challenging problem of finding an $(\epsilon,
\delta)$-local minimum instead of just finding an $\epsilon$-approximate
(first-order) stationary point (which may be some bad unstable saddle points).
We show that SSRGD can find an $(\epsilon, \delta)$-local minimum by simply
adding some random perturbations. Our algorithm is almost as simple as its
counterpart for finding stationary points, and achieves similar optimal rates.
Related papers
- An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Faster Single-loop Algorithms for Minimax Optimization without Strong
Concavity [30.97847679999505]
Gradient descent ascent (GDA) is the simplest single-loop algorithm for nonstationary minimax optimization.
Recent work shows inferior convergence rates of GDA in adversarial networks.
Two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- are presented.
arXiv Detail & Related papers (2021-12-10T15:35:42Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.