UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic
Optimization
- URL: http://arxiv.org/abs/2305.05675v1
- Date: Tue, 9 May 2023 13:07:03 GMT
- Title: UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic
Optimization
- Authors: Yiming Jiang, Jinlan Liu, Dongpo Xu, Danilo P. Mandic
- Abstract summary: We introduce a unified framework for Adam-type algorithms (called UAdam)
This is equipped with a general form of the second-order moment, such as NAdamBound, AdaFom, and Adan.
We show that UAdam converges to the neighborhood of stationary points with the rate of $mathcalO (1/T)$.
- Score: 20.399244578926474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adam-type algorithms have become a preferred choice for optimisation in the
deep learning setting, however, despite success, their convergence is still not
well understood. To this end, we introduce a unified framework for Adam-type
algorithms (called UAdam). This is equipped with a general form of the
second-order moment, which makes it possible to include Adam and its variants
as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. This is
supported by a rigorous convergence analysis of UAdam in the non-convex
stochastic setting, showing that UAdam converges to the neighborhood of
stationary points with the rate of $\mathcal{O}(1/T)$. Furthermore, the size of
neighborhood decreases as $\beta$ increases. Importantly, our analysis only
requires the first-order momentum factor to be close enough to 1, without any
restrictions on the second-order momentum factor. Theoretical results also show
that vanilla Adam can converge by selecting appropriate hyperparameters, which
provides a theoretical guarantee for the analysis, applications, and further
developments of the whole class of Adam-type algorithms.
Related papers
- A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD [28.905886549938305]
We introduce a novel and comprehensive framework for analyzing the convergence properties of Adam.
We show that Adam attains non-asymptotic complexity sample bounds similar to those of gradient descent.
arXiv Detail & Related papers (2024-10-06T12:15:00Z) - On Convergence of Adam for Stochastic Optimization under Relaxed
Assumptions [4.9495085874952895]
Adaptive Momentum Estimation (Adam) algorithm is highly effective in various deep learning tasks.
We show that Adam can find a stationary point variance with a rate in high iterations under this general noise model.
arXiv Detail & Related papers (2024-02-06T13:19:26Z) - Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters
and Non-ergodic Case [0.0]
This paper focuses on exploring the convergence of vanilla Adam and the challenges of non-ergodic convergence.
These findings build a solid theoretical foundation for Adam to solve non-godic optimization problems.
arXiv Detail & Related papers (2023-07-20T12:02:17Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - Provable Adaptivity of Adam under Non-uniform Smoothness [79.25087082434975]
Adam is widely adopted in practical applications due to its fast convergence.
Existing convergence analyses for Adam rely on the bounded smoothness assumption.
This paper studies the convergence of randomly reshuffled Adam with diminishing learning rate.
arXiv Detail & Related papers (2022-08-21T14:57:47Z) - A Novel Convergence Analysis for Algorithms of the Adam Family [105.22760323075008]
We present a generic proof of convergence for a family of Adam-style methods including Adam, AMSGrad, Adabound, etc.
Our analysis is so simple and generic that it can be leveraged to establish the convergence for solving a broader family of non- compositional optimization problems.
arXiv Detail & Related papers (2021-12-07T02:47:58Z) - Adam$^+$: A Stochastic Method with Adaptive Variance Reduction [56.051001950733315]
Adam is a widely used optimization method for deep learning applications.
We propose a new method named Adam$+$ (pronounced as Adam-plus)
Our empirical studies on various deep learning tasks, including image classification, language modeling, and automatic speech recognition, demonstrate that Adam$+$ significantly outperforms Adam.
arXiv Detail & Related papers (2020-11-24T09:28:53Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z) - Non-asymptotic Convergence of Adam-type Reinforcement Learning
Algorithms under Markovian Sampling [56.394284787780364]
This paper provides the first theoretical convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning.
Under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $mathcalO(log T/sqrtT)$.
Under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $mathcalO(log T/sqrtT
arXiv Detail & Related papers (2020-02-15T00:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.