Related papers: Local Convergence of Adaptive Gradient Descent Optimizers

Local Convergence of Adaptive Gradient Descent Optimizers

URL: http://arxiv.org/abs/2102.09804v1
Date: Fri, 19 Feb 2021 08:36:13 GMT
Title: Local Convergence of Adaptive Gradient Descent Optimizers
Authors: Sebastian Bock and Martin Georg Wei{\ss}
Abstract summary: Adaptive Moment Estimation (ADAM) is a very popular algorithm for deep neural networks and belongs to the family of adaptive gradient descents. No complete analysis exists for ADAM. The contribution of this paper is a method for deterministic convergence analysis in batch mode.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adaptive Moment Estimation (ADAM) is a very popular training algorithm for deep neural networks and belongs to the family of adaptive gradient descent optimizers. However to the best of the authors knowledge no complete convergence analysis exists for ADAM. The contribution of this paper is a method for the local convergence analysis in batch mode for a deterministic fixed training set, which gives necessary conditions for the hyperparameters of the ADAM algorithm. Due to the local nature of the arguments the objective function can be non-convex but must be at least twice continuously differentiable. Then we apply this procedure to other adaptive gradient descent algorithms and show for most of them local convergence with hyperparameter bounds.

Related papers

Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling. We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z)
AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity. We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits. This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z)
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online Optimization [4.103281325880475]
This paper deals with a network of computing agents aiming to solve an online optimization problem in a distributed fashion, by means of local computation and communication, without any central coordinator. We propose the gradient tracking with adaptive momentum estimation (GTAdam) distributed algorithm, which combines a gradient tracking mechanism with first and second order momentum estimates of the gradient. In these numerical experiments from multi-agent learning, GTAdam outperforms state-of-the-art distributed optimization methods.
arXiv Detail & Related papers (2020-09-03T15:20:21Z)
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients [112.00379151834242]
We propose adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance each coordinate. This results in faster adaptation, which leads more desirable empirical convergence behaviors.
arXiv Detail & Related papers (2020-06-21T21:47:43Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning [48.42916680063503]
We propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. We show that nested-forest decomposition of the inactive edges yields a guaranteed local linear convergence rate. Our results suggest that local convergence analysis can serve as a guideline for selecting variable metrics in proximal algorithms.
arXiv Detail & Related papers (2020-02-27T16:33:09Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.