A Continuized View on Nesterov Acceleration for Stochastic Gradient
Descent and Randomized Gossip
- URL: http://arxiv.org/abs/2106.07644v1
- Date: Thu, 10 Jun 2021 08:35:55 GMT
- Title: A Continuized View on Nesterov Acceleration for Stochastic Gradient
Descent and Randomized Gossip
- Authors: Mathieu Even, Rapha\"el Berthier, Francis Bach, Nicolas Flammarion,
Pierre Gaillard, Hadrien Hendrikx, Laurent Massouli\'e, Adrien Taylor
- Abstract summary: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
We show that the discretization has the same structure as Nesterov acceleration, but with random parameters.
We provide the first rigorous acceleration of asynchronous gossip algorithms.
- Score: 23.519404358187156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the continuized Nesterov acceleration, a close variant of
Nesterov acceleration whose variables are indexed by a continuous time
parameter. The two variables continuously mix following a linear ordinary
differential equation and take gradient steps at random times. This continuized
variant benefits from the best of the continuous and the discrete frameworks:
as a continuous process, one can use differential calculus to analyze
convergence and obtain analytical expressions for the parameters; and a
discretization of the continuized process can be computed exactly with
convergence rates similar to those of Nesterov original acceleration. We show
that the discretization has the same structure as Nesterov acceleration, but
with random parameters. We provide continuized Nesterov acceleration under
deterministic as well as stochastic gradients, with either additive or
multiplicative noise. Finally, using our continuized framework and expressing
the gossip averaging problem as the stochastic minimization of a certain energy
function, we provide the first rigorous acceleration of asynchronous gossip
algorithms.
Related papers
- Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Last-iterate convergence analysis of stochastic momentum methods for
neural networks [3.57214198937538]
The momentum method is used to solve large-scale optimization problems in neural networks.
Current convergence results of momentum methods under artificial settings.
The momentum factors can be fixed to be constant, rather than in existing time.
arXiv Detail & Related papers (2022-05-30T02:17:44Z) - On the Hyperparameters in Stochastic Gradient Descent with Momentum [6.396288020763144]
We present the theoretical analysis for gradient descent with momentum (SGD) in this paper.
By we, and the final comparison is introduced, we show why the optimal linear rate for SGD only about the surrogate rate varies with increasing from zero to when the rate increases.
Finally, we show the surrogate momentum under the rate has no essential difference with the momentum.
arXiv Detail & Related papers (2021-08-09T11:25:03Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - Convergence of Gradient Algorithms for Nonconvex $C^{1+\alpha}$ Cost
Functions [2.538209532048867]
We show a byproduct that a gradient converges and provide an explicit upper bound on the convergence rate.
This allows us to prove the almost sure convergence by Doobmartin's supergale convergence theorem.
arXiv Detail & Related papers (2020-12-01T16:48:59Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z) - On the Convergence of Nesterov's Accelerated Gradient Method in
Stochastic Settings [5.101801159418223]
We study Nesterov's accelerated gradient method with constant step-size and momentum parameters.
In the finite-sum setting, we prove that Nesterov's method may diverge with the usual choice of step-size and momentum.
arXiv Detail & Related papers (2020-02-27T19:56:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.