Related papers: Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis

Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis

URL: http://arxiv.org/abs/2409.05023v1
Date: Sun, 8 Sep 2024 08:29:51 GMT
Title: Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis
Authors: Ruinan Jin, Xiaoyu Wang, Baoxiang Wang,
Abstract summary: Adaptives have emerged as powerful tools in deep learning, dynamically adjusting the learning rate based on gradient. These methods have significantly succeeded in various deep learning tasks, but AdaGrad is the cornerstone of this work.
Score: 17.34603953600226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptive optimizers have emerged as powerful tools in deep learning, dynamically adjusting the learning rate based on iterative gradients. These adaptive methods have significantly succeeded in various deep learning tasks, outperforming stochastic gradient descent (SGD). However, although AdaGrad is a cornerstone adaptive optimizer, its theoretical analysis is inadequate in addressing asymptotic convergence and non-asymptotic convergence rates on non-convex optimization. This study aims to provide a comprehensive analysis and complete picture of AdaGrad. We first introduce a novel stopping time technique from probabilistic theory to establish stability for the norm version of AdaGrad under milder conditions. We further derive two forms of asymptotic convergence: almost sure and mean-square. Furthermore, we demonstrate the near-optimal non-asymptotic convergence rate measured by the average-squared gradients in expectation, which is rarely explored and stronger than the existing high-probability results, under the mild assumptions. The techniques developed in this work are potentially independent of interest for future research on other adaptive stochastic algorithms.

Related papers

Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
High Probability Analysis for Non-Convex Stochastic Optimization with Clipping [13.025261730510847]
gradient clipping is a technique for dealing with the heavy-tailed neural networks. Most theoretical guarantees only provide an in-expectation analysis and only on the performance. Our analysis provides a relatively complete picture for the theoretical guarantee of optimization algorithms with gradient clipping.
arXiv Detail & Related papers (2023-07-25T17:36:56Z)
Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys. Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z)
Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory [30.061707627742766]
We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms. In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation.
arXiv Detail & Related papers (2023-04-24T21:51:58Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population. We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z)
A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization [0.0]
It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field. In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
arXiv Detail & Related papers (2021-10-16T09:47:57Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset. We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z)
A High Probability Analysis of Adaptive SGD with Momentum [22.9530287983179]
Gradient Descent (DSG) and its variants are the most used algorithms in machine learning applications. We show for the first time the probability of the gradients to zero in smooth non setting for DelayedGrad with momentum.
arXiv Detail & Related papers (2020-07-28T15:06:22Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates. In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.