Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity
- URL: http://arxiv.org/abs/2306.09694v2
- Date: Tue, 9 Apr 2024 00:43:58 GMT
- Title: Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity
- Authors: Bowen Li, Bin Shi, Ya-xiang Yuan,
- Abstract summary: We show that both Nesterov's accelerated gradient descent (NAG) and FISTA exhibit linear convergence for strongly convex functions.
We emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy.
- Score: 14.0409219811182
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A significant milestone in modern gradient-based optimization was achieved with the development of Nesterov's accelerated gradient descent (NAG) method. This forward-backward technique has been further advanced with the introduction of its proximal generalization, commonly known as the fast iterative shrinkage-thresholding algorithm (FISTA), which enjoys widespread application in image science and engineering. Nonetheless, it remains unclear whether both NAG and FISTA exhibit linear convergence for strongly convex functions. Remarkably, these algorithms demonstrate convergence without requiring any prior knowledge of strongly convex modulus, and this intriguing characteristic has been acknowledged as an open problem in the comprehensive review [Chambolle and Pock, 2016, Appendix B]. In this paper, we address this question by utilizing the high-resolution ordinary differential equation (ODE) framework. Expanding upon the established phase-space representation, we emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy that evolves throughout the iterations. Furthermore, we highlight that the linear convergence of both NAG and FISTA is independent of the parameter $r$. Additionally, we demonstrate that the square of the proximal subgradient norm likewise advances towards linear convergence.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Understanding Accelerated Gradient Methods: Lyapunov Analyses and
Hamiltonian Assisted Interpretations [1.0152838128195465]
We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex functions.
We establish sufficient conditions, via new discrete Lyapunov analyses, for achieving accelerated convergence rates which match Nesterov's methods in the strongly and general convex settings.
We propose a novel class of discrete algorithms, called the Hamiltonian assisted gradient method, directly based on a Hamiltonian function and several interpretable operations.
arXiv Detail & Related papers (2023-04-20T03:03:30Z) - Online Learning Guided Curvature Approximation: A Quasi-Newton Method
with Global Non-Asymptotic Superlinear Convergence [22.286753988825048]
We present the first globally convergent quasi-Newton method with an explicit non-asymptotic superlinear convergence rate.
Unlike classical quasi-Newton methods, we build our algorithm upon the hybrid proximal extragradient method.
arXiv Detail & Related papers (2023-02-16T20:58:09Z) - Revisiting the acceleration phenomenon via high-resolution differential
equations [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms.
We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation.
We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
arXiv Detail & Related papers (2022-12-12T04:36:37Z) - Proximal Subgradient Norm Minimization of ISTA and FISTA [8.261388753972234]
We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate.
We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
arXiv Detail & Related papers (2022-11-03T06:50:19Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.