Related papers: A New Accelerated Stochastic Gradient Method with Momentum

A New Accelerated Stochastic Gradient Method with Momentum

URL: http://arxiv.org/abs/2006.00423v1
Date: Sun, 31 May 2020 03:04:32 GMT
Title: A New Accelerated Stochastic Gradient Method with Momentum
Authors: Liang Liu and Xiaopeng Luo
Abstract summary: gradient descent with momentum (Sgdm) use weights that decay exponentially with the iteration times to generate an momentum term. We provide theoretical convergence properties analyses for our method, which show both the exponentially decay weights and our inverse proportionally decay weights can limit the variance of the moving direction of parameters to be optimized to a region.
Score: 4.967897656554012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel accelerated stochastic gradient method with momentum, which momentum is the weighted average of previous gradients. The weights decays inverse proportionally with the iteration times. Stochastic gradient descent with momentum (Sgdm) use weights that decays exponentially with the iteration times to generate an momentum term. Using exponentially decaying weights, variants of Sgdm with well designed and complicated formats have been proposed to achieve better performance. The momentum update rules of our method is as simple as that of Sgdm. We provide theoretical convergence properties analyses for our method, which show both the exponentially decay weights and our inverse proportionally decay weights can limit the variance of the moving direction of parameters to be optimized to a region. Experimental results empirically show that our method works well with practical problems and outperforms Sgdm, and it outperforms Adam in convolutional neural networks.

Related papers

Effective continuous equations for adaptive SGD: a stochastic analysis view [1.2744523252873352]
We present a theoretical analysis of some popular adaptive Gradient Descent (SGD) methods in the small learning rate regime.<n>Our key contribution is that sampling-induced noise in SGD manifests as independent Brownian motions driving the parameter gradient second momentum evolutions.
arXiv Detail & Related papers (2025-09-25T21:31:20Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Last-iterate convergence analysis of stochastic momentum methods for neural networks [3.57214198937538]
The momentum method is used to solve large-scale optimization problems in neural networks. Current convergence results of momentum methods under artificial settings. The momentum factors can be fixed to be constant, rather than in existing time.
arXiv Detail & Related papers (2022-05-30T02:17:44Z)
Momentum Doesn't Change the Implicit Bias [36.301490759243876]
We analyze the implicit bias of momentum-based optimization. We construct new Lyapunov functions as a tool to analyze the gap between the model parameter and the max-margin solution.
arXiv Detail & Related papers (2021-10-08T04:37:18Z)
Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum [12.324457683544132]
We propose a momentum method for such model averaging approaches. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.
arXiv Detail & Related papers (2021-10-01T19:23:18Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z)
Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problem [12.132641563193584]
When over loss functions it is common practice to use momentum-based methods rather than vanilla gradient-based loss method. We show how having a mass increases the effective step ball dynamics dynamics leading to up.
arXiv Detail & Related papers (2021-02-23T15:30:57Z)
SMG: A Shuffling Gradient-Based Method with Momentum [25.389545522794172]
We combine two advanced ideas widely used in optimization for machine learning. We develop a novel shuffling-based momentum technique. Our tests have shown encouraging performance of the new algorithms.
arXiv Detail & Related papers (2020-11-24T04:12:35Z)
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches. At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.