Related papers: Sequential convergence of AdaGrad algorithm for smooth convex optimization

Sequential convergence of AdaGrad algorithm for smooth convex optimization

URL: http://arxiv.org/abs/2011.12341v3
Date: Tue, 13 Apr 2021 16:00:21 GMT
Title: Sequential convergence of AdaGrad algorithm for smooth convex optimization
Authors: Cheik Traor\'e and Edouard Pauwels
Abstract summary: We prove that the iterates produced by, or the coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to convex objective functions with Lipschitz gradient. The key insight is to remark that such AdaGrad sequences satisfy a variable metric quasi-Fej'er monotonicity property, which allows to prove convergence.
Score: 5.584060970507506
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove that the iterates produced by, either the scalar step size variant, or the coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to convex objective functions with Lipschitz gradient. The key insight is to remark that such AdaGrad sequences satisfy a variable metric quasi-Fej\'er monotonicity property, which allows to prove convergence.

Related papers

Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent [0.3831327965422187]
This paper proposes a novel approach to adaptive step sizes in gradient descent (SGD) We use quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions.
arXiv Detail & Related papers (2023-11-28T17:03:56Z)
The Novel Adaptive Fractional Order Gradient Decent Algorithms Design via Robust Control [5.5491171448159715]
The vanilla fractional order descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex. To address this problem, a novel adaptive fractional order descent (AFDOG) and a novel fractional descent (AFOAGD) method are proposed in this paper.
arXiv Detail & Related papers (2023-03-08T02:03:30Z)
Asymptotic convergence of iterative optimization algorithms [1.6328866317851185]
This paper introduces a general framework for iterative optimization algorithms. We prove that under appropriate assumptions, the rate of convergence can be lower bounded. We provide the exact convergence rate.
arXiv Detail & Related papers (2023-02-24T09:58:56Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration [33.247600151322466]
We develop a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate.
arXiv Detail & Related papers (2022-09-29T14:44:40Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Explicit Regularization of Stochastic Gradient Methods through Duality [9.131027490864938]
We propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent. For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing gradient methods in the interpolating regime.
arXiv Detail & Related papers (2020-03-30T20:44:56Z)
Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning [48.42916680063503]
We propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. We show that nested-forest decomposition of the inactive edges yields a guaranteed local linear convergence rate. Our results suggest that local convergence analysis can serve as a guideline for selecting variable metrics in proximal algorithms.
arXiv Detail & Related papers (2020-02-27T16:33:09Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.