On Stochastic Moving-Average Estimators for Non-Convex Optimization
- URL: http://arxiv.org/abs/2104.14840v1
- Date: Fri, 30 Apr 2021 08:50:24 GMT
- Title: On Stochastic Moving-Average Estimators for Non-Convex Optimization
- Authors: Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang
- Abstract summary: In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
- Score: 105.22760323075008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we demonstrate the power of a widely used stochastic estimator
based on moving average (SEMA) on a range of stochastic non-convex optimization
problems, which only requires {\bf a general unbiased stochastic oracle}. We
analyze various stochastic methods (existing or newly proposed) based on the
{\bf variance recursion property} of SEMA for three families of non-convex
optimization, namely standard stochastic non-convex minimization, stochastic
non-convex strongly-concave min-max optimization, and stochastic bilevel
optimization. Our contributions include: (i) for standard stochastic non-convex
minimization, we present a simple and intuitive proof of convergence for a
family Adam-style methods (including Adam) with an increasing or large
"momentum" parameter for the first-order moment, which gives an alternative yet
more natural way to guarantee Adam converge; (ii) for stochastic non-convex
strongly-concave min-max optimization, we present a single-loop stochastic
gradient descent ascent method based on the moving average estimators and
establish its oracle complexity of $O(1/\epsilon^4)$ without using a large
mini-batch size, addressing a gap in the literature; (iii) for stochastic
bilevel optimization, we present a single-loop stochastic method based on the
moving average estimators and establish its oracle complexity of $\widetilde
O(1/\epsilon^4)$ without computing the inverse or SVD of the Hessian matrix,
improving state-of-the-art results. For all these problems, we also establish a
variance diminishing result for the used stochastic gradient estimators.
Related papers
- A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand.
We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z) - Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - Faster Riemannian Newton-type Optimization by Subsampling and Cubic
Regularization [3.867143522757309]
This work is on constrained large-scale non-constrained optimization where the constraint set implies a manifold structure.
We propose a new second-order saddleian optimization algorithm, aiming at improving convergence and reducing computational cost.
arXiv Detail & Related papers (2023-02-22T00:37:44Z) - A theoretical and empirical study of new adaptive algorithms with
additional momentum steps and shifted updates for stochastic non-convex
optimization [0.0]
It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field.
In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
arXiv Detail & Related papers (2021-10-16T09:47:57Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods
to Deep Binary Model [34.093978443640616]
Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs)
guaranteeing the effective and efficient training of BNNs is an unsolved problem.
We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
arXiv Detail & Related papers (2020-09-29T06:12:32Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic
Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems.
Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.