A New Accelerated Stochastic Gradient Method with Momentum
- URL: http://arxiv.org/abs/2006.00423v1
- Date: Sun, 31 May 2020 03:04:32 GMT
- Title: A New Accelerated Stochastic Gradient Method with Momentum
- Authors: Liang Liu and Xiaopeng Luo
- Abstract summary: gradient descent with momentum (Sgdm) use weights that decay exponentially with the iteration times to generate an momentum term.
We provide theoretical convergence properties analyses for our method, which show both the exponentially decay weights and our inverse proportionally decay weights can limit the variance of the moving direction of parameters to be optimized to a region.
- Score: 4.967897656554012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel accelerated stochastic gradient method with
momentum, which momentum is the weighted average of previous gradients. The
weights decays inverse proportionally with the iteration times. Stochastic
gradient descent with momentum (Sgdm) use weights that decays exponentially
with the iteration times to generate an momentum term. Using exponentially
decaying weights, variants of Sgdm with well designed and complicated formats
have been proposed to achieve better performance. The momentum update rules of
our method is as simple as that of Sgdm. We provide theoretical convergence
properties analyses for our method, which show both the exponentially decay
weights and our inverse proportionally decay weights can limit the variance of
the moving direction of parameters to be optimized to a region. Experimental
results empirically show that our method works well with practical problems and
outperforms Sgdm, and it outperforms Adam in convolutional neural networks.
Related papers
- Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Last-iterate convergence analysis of stochastic momentum methods for
neural networks [3.57214198937538]
The momentum method is used to solve large-scale optimization problems in neural networks.
Current convergence results of momentum methods under artificial settings.
The momentum factors can be fixed to be constant, rather than in existing time.
arXiv Detail & Related papers (2022-05-30T02:17:44Z) - Momentum Doesn't Change the Implicit Bias [36.301490759243876]
We analyze the implicit bias of momentum-based optimization.
We construct new Lyapunov functions as a tool to analyze the gap between the model parameter and the max-margin solution.
arXiv Detail & Related papers (2021-10-08T04:37:18Z) - Accelerate Distributed Stochastic Descent for Nonconvex Optimization
with Momentum [12.324457683544132]
We propose a momentum method for such model averaging approaches.
We analyze the convergence and scaling properties of such momentum methods.
Our experimental results show that block momentum not only accelerates training, but also achieves better results.
arXiv Detail & Related papers (2021-10-01T19:23:18Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods
in Paradigmatic High-Dimensional Non-Convex Problem [12.132641563193584]
When over loss functions it is common practice to use momentum-based methods rather than vanilla gradient-based loss method.
We show how having a mass increases the effective step ball dynamics dynamics leading to up.
arXiv Detail & Related papers (2021-02-23T15:30:57Z) - SMG: A Shuffling Gradient-Based Method with Momentum [25.389545522794172]
We combine two advanced ideas widely used in optimization for machine learning.
We develop a novel shuffling-based momentum technique.
Our tests have shown encouraging performance of the new algorithms.
arXiv Detail & Related papers (2020-11-24T04:12:35Z) - Reintroducing Straight-Through Estimators as Principled Methods for
Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights.
Many successful experimental results have been achieved with empirical straight-through (ST) approaches.
At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.