Last-iterate convergence analysis of stochastic momentum methods for
neural networks
- URL: http://arxiv.org/abs/2205.14811v1
- Date: Mon, 30 May 2022 02:17:44 GMT
- Title: Last-iterate convergence analysis of stochastic momentum methods for
neural networks
- Authors: Dongpo Xu, Jinlan Liu, Yinghua Lu, Jun Kong, Danilo Mandic
- Abstract summary: The momentum method is used to solve large-scale optimization problems in neural networks.
Current convergence results of momentum methods under artificial settings.
The momentum factors can be fixed to be constant, rather than in existing time.
- Score: 3.57214198937538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic momentum method is a commonly used acceleration technique for
solving large-scale stochastic optimization problems in artificial neural
networks. Current convergence results of stochastic momentum methods under
non-convex stochastic settings mostly discuss convergence in terms of the
random output and minimum output. To this end, we address the convergence of
the last iterate output (called last-iterate convergence) of the stochastic
momentum methods for non-convex stochastic optimization problems, in a way
conformal with traditional optimization theory. We prove the last-iterate
convergence of the stochastic momentum methods under a unified framework,
covering both stochastic heavy ball momentum and stochastic Nesterov
accelerated gradient momentum. The momentum factors can be fixed to be
constant, rather than time-varying coefficients in existing analyses. Finally,
the last-iterate convergence of the stochastic momentum methods is verified on
the benchmark MNIST and CIFAR-10 datasets.
Related papers
- PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash
Equilibrium [62.51015395213579]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
The proposed algorithm employs the movements of particles to represent the updates of random strategies for the $ilon$-mixed Nash equilibrium.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - On Almost Sure Convergence Rates of Stochastic Gradient Methods [11.367487348673793]
We show for the first time that the almost sure convergence rates obtained for gradient methods are arbitrarily close to their optimal convergence rates possible.
For non- objective functions, we only show that a weighted average of squared gradient norms converges not almost surely, but also lasts to zero almost surely.
arXiv Detail & Related papers (2022-02-09T06:05:30Z) - Convergence and Stability of the Stochastic Proximal Point Algorithm
with Momentum [14.158845925610438]
We show how a gradient proximal algorithm with momentum (PPA) allows faster convergence to a neighborhood compared to the proximal algorithm (PPA) with better contraction factor.
arXiv Detail & Related papers (2021-11-11T12:17:22Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Escaping Saddle Points Faster with Stochastic Momentum [9.485782209646445]
In deep networks, momentum appears to significantly improve convergence time.
We show that momentum improves deep training because it modifies SGD to escape points faster.
We also show how to choose the ideal momentum parameter.
arXiv Detail & Related papers (2021-06-05T23:34:02Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - A New Accelerated Stochastic Gradient Method with Momentum [4.967897656554012]
gradient descent with momentum (Sgdm) use weights that decay exponentially with the iteration times to generate an momentum term.
We provide theoretical convergence properties analyses for our method, which show both the exponentially decay weights and our inverse proportionally decay weights can limit the variance of the moving direction of parameters to be optimized to a region.
arXiv Detail & Related papers (2020-05-31T03:04:32Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic
Perspectives [97.16266088683061]
The article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms.
It provides a characterization of algorithms that exhibit accelerated convergence.
arXiv Detail & Related papers (2020-02-28T00:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.