Related papers: On Underdamped Nesterov's Acceleration

On Underdamped Nesterov's Acceleration

URL: http://arxiv.org/abs/2304.14642v1
Date: Fri, 28 Apr 2023 06:08:19 GMT
Title: On Underdamped Nesterov's Acceleration
Authors: Shuo Chen, Bin Shi, Ya-xiang Yuan
Abstract summary: High-resolution differential equation framework tailor-made for Nesterov's accelerated gradient descent method. New Lyapunov functions for the underdamped case are constructed. convergence rates are consistent with that obtained for the underdamped case with $r=-1$.
Score: 6.53306151979817
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The high-resolution differential equation framework has been proven to be tailor-made for Nesterov's accelerated gradient descent method~(\texttt{NAG}) and its proximal correspondence -- the class of faster iterative shrinkage thresholding algorithms (FISTA). However, the systems of theories is not still complete, since the underdamped case ($r < 2$) has not been included. In this paper, based on the high-resolution differential equation framework, we construct the new Lyapunov functions for the underdamped case, which is motivated by the power of the time $t^{\gamma}$ or the iteration $k^{\gamma}$ in the mixed term. When the momentum parameter $r$ is $2$, the new Lyapunov functions are identical to the previous ones. These new proofs do not only include the convergence rate of the objective value previously obtained according to the low-resolution differential equation framework but also characterize the convergence rate of the minimal gradient norm square. All the convergence rates obtained for the underdamped case are continuously dependent on the parameter $r$. In addition, it is observed that the high-resolution differential equation approximately simulates the convergence behavior of~\texttt{NAG} for the critical case $r=-1$, while the low-resolution differential equation degenerates to the conservative Newton's equation. The high-resolution differential equation framework also theoretically characterizes the convergence rates, which are consistent with that obtained for the underdamped case with $r=-1$.

Related papers

A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation [12.321507997896218]
We study the dynamics of gradient flow with small weight decay on general training losses $F: mathbbRd to mathbbR$.
arXiv Detail & Related papers (2025-05-26T16:12:45Z)
Variance-Reduced Fast Operator Splitting Methods for Stochastic Generalized Equations [8.0153031008486]
We introduce a class of variance-reduced estimators and establish their variance-reduction bounds. Next, we design a novel accelerated variance-reduced forward-backward splitting (FBS) algorithm. Our method achieves both $mathcalO (1/k2)$ and $o (1/k2)$ convergence rates on the expected norm.
arXiv Detail & Related papers (2025-04-17T16:02:20Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions. Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z)
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate. These two bounds jointly give an exact characterization of the convergence rate. We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z)
Revisiting the acceleration phenomenon via high-resolution differential equations [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms. We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation. We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
arXiv Detail & Related papers (2022-12-12T04:36:37Z)
Proximal Subgradient Norm Minimization of ISTA and FISTA [8.261388753972234]
We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate. We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
arXiv Detail & Related papers (2022-11-03T06:50:19Z)
A Primal-Dual Approach to Solving Variational Inequalities with General Constraints [54.62996442406718]
Yang et al. (2023) recently showed how to use first-order gradient methods to solve general variational inequalities. We prove the convergence of this method and show that the gap function of the last iterate of the method decreases at a rate of $O(frac1sqrtK)$ when the operator is $L$-Lipschitz and monotone.
arXiv Detail & Related papers (2022-10-27T17:59:09Z)
Gradient Norm Minimization of Nesterov Acceleration: $o(1/k^3)$ [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones of first-order algorithms. In this paper, we provide a significantly simplified proof based on precise observation and a tighter inequality for $L$-smooth functions.
arXiv Detail & Related papers (2022-09-19T09:10:35Z)
A globally convergent fast iterative shrinkage-thresholding algorithm with a new momentum factor for single and multi-objective convex optimization [0.0]
We show that our proposed method also has a global convergence rate of $O(1/k2)$ for any $(a,b)$. We report numerical results with various $(a,b)$, showing that some of these choices give better results than the classical momentum factors.
arXiv Detail & Related papers (2022-05-11T04:26:00Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Cumulant GAN [17.4556035872983]
We propose a novel loss function for training Generative Adversarial Networks (GANs) We show that the corresponding optimization problem is equivalent to R'enyi divergence minimization. We experimentally demonstrate that image generation is more robust relative to Wasserstein GAN.
arXiv Detail & Related papers (2020-06-11T17:23:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.