On Stochastic Moving-Average Estimators for Non-Convex Optimization
- URL: http://arxiv.org/abs/2104.14840v1
- Date: Fri, 30 Apr 2021 08:50:24 GMT
- Title: On Stochastic Moving-Average Estimators for Non-Convex Optimization
- Authors: Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang
- Abstract summary: In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
- Score: 105.22760323075008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we demonstrate the power of a widely used stochastic estimator
based on moving average (SEMA) on a range of stochastic non-convex optimization
problems, which only requires {\bf a general unbiased stochastic oracle}. We
analyze various stochastic methods (existing or newly proposed) based on the
{\bf variance recursion property} of SEMA for three families of non-convex
optimization, namely standard stochastic non-convex minimization, stochastic
non-convex strongly-concave min-max optimization, and stochastic bilevel
optimization. Our contributions include: (i) for standard stochastic non-convex
minimization, we present a simple and intuitive proof of convergence for a
family Adam-style methods (including Adam) with an increasing or large
"momentum" parameter for the first-order moment, which gives an alternative yet
more natural way to guarantee Adam converge; (ii) for stochastic non-convex
strongly-concave min-max optimization, we present a single-loop stochastic
gradient descent ascent method based on the moving average estimators and
establish its oracle complexity of $O(1/\epsilon^4)$ without using a large
mini-batch size, addressing a gap in the literature; (iii) for stochastic
bilevel optimization, we present a single-loop stochastic method based on the
moving average estimators and establish its oracle complexity of $\widetilde
O(1/\epsilon^4)$ without computing the inverse or SVD of the Hessian matrix,
improving state-of-the-art results. For all these problems, we also establish a
variance diminishing result for the used stochastic gradient estimators.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions [26.543628010637036]
We introduce a novel adaptive reduction method that achieves an optimal convergence rate of $mathcalO(log T)$ for non- functions.
We also extend the proposed technique to obtain the same optimal rate of $mathcalO(log T)$ for compositional optimization.
arXiv Detail & Related papers (2024-06-04T04:39:51Z) - Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax
Problems [39.197569803430646]
Minimax optimization plays an important role in many machine learning tasks such as adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting.
arXiv Detail & Related papers (2023-04-21T11:38:41Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite
Nonconvex Optimization [23.355249183979907]
We propose a new variant of the hybrid variance method in [7] to solve a common composite non-reduced problem under standard assumptions.
We simply replace the independent unbiased estimator in our estimator introduced in [7] by the gradient evaluated the same sample.
Our analysis is essentially inspired by [7], but we do not use two different step-sizes.
arXiv Detail & Related papers (2020-08-20T16:15:12Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.