On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
- URL: http://arxiv.org/abs/1808.05671v4
- Date: Thu, 20 Jun 2024 16:13:13 GMT
- Title: On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
- Authors: Dongruo Zhou, Jinghui Chen, Yuan Cao, Ziyan Yang, Quanquan Gu,
- Abstract summary: We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
- Score: 80.03647903934723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.
Related papers
- A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality [5.35599092568615]
We show that AdaGrad and Adam converge linearly when the cost function is smooth and satisfies the Polyak-Lojasiewicz inequality.
Our framework can potentially be utilized in analyzing linear convergence of other variants of Adam.
arXiv Detail & Related papers (2024-07-17T14:56:21Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded
Gradients and Affine Variance [46.15915820243487]
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
arXiv Detail & Related papers (2022-02-11T17:37:54Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Constrained and Composite Optimization via Adaptive Sampling Methods [3.4219044933964944]
The motivation for this paper stems from the desire to develop an adaptive sampling method for solving constrained optimization problems.
The method proposed in this paper is a proximal gradient method that can also be applied to the composite optimization problem min f(x) + h(x), where f is convex (but not necessarily differentiable)
arXiv Detail & Related papers (2020-12-31T02:50:39Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Adaptive Gradient Methods for Constrained Convex Optimization and
Variational Inequalities [32.51470158863247]
Main algorithms AdaACSA and AdaAGD+ are accelerated methods for constrained convex optimization.
We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate.
arXiv Detail & Related papers (2020-07-17T09:10:21Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.