Revisiting the acceleration phenomenon via high-resolution differential
equations
- URL: http://arxiv.org/abs/2212.05700v1
- Date: Mon, 12 Dec 2022 04:36:37 GMT
- Title: Revisiting the acceleration phenomenon via high-resolution differential
equations
- Authors: Shuo Chen, Bin Shi, Ya-xiang Yuan
- Abstract summary: Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms.
We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation.
We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
- Score: 6.53306151979817
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Nesterov's accelerated gradient descent (NAG) is one of the milestones in the
history of first-order algorithms. It was not successfully uncovered until the
high-resolution differential equation framework was proposed in [Shi et al.,
2022] that the mechanism behind the acceleration phenomenon is due to the
gradient correction term. To deepen our understanding of the high-resolution
differential equation framework on the convergence rate, we continue to
investigate NAG for the $\mu$-strongly convex function based on the techniques
of Lyapunov analysis and phase-space representation in this paper. First, we
revisit the proof from the gradient-correction scheme. Similar to [Chen et al.,
2022], the straightforward calculation simplifies the proof extremely and
enlarges the step size to $s=1/L$ with minor modification. Meanwhile, the way
of constructing Lyapunov functions is principled. Furthermore, we also
investigate NAG from the implicit-velocity scheme. Due to the difference in the
velocity iterates, we find that the Lyapunov function is constructed from the
implicit-velocity scheme without the additional term and the calculation of
iterative difference becomes simpler. Together with the optimal step size
obtained, the high-resolution differential equation framework from the
implicit-velocity scheme of NAG is perfect and outperforms the
gradient-correction scheme.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity [14.0409219811182]
We show that both Nesterov's accelerated gradient descent (NAG) and FISTA exhibit linear convergence for strongly convex functions.
We emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy.
arXiv Detail & Related papers (2023-06-16T08:58:40Z) - Proximal Subgradient Norm Minimization of ISTA and FISTA [8.261388753972234]
We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate.
We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
arXiv Detail & Related papers (2022-11-03T06:50:19Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.