SMG: A Shuffling Gradient-Based Method with Momentum
- URL: http://arxiv.org/abs/2011.11884v3
- Date: Wed, 9 Jun 2021 13:50:24 GMT
- Title: SMG: A Shuffling Gradient-Based Method with Momentum
- Authors: Trang H. Tran, Lam M. Nguyen, Quoc Tran-Dinh
- Abstract summary: We combine two advanced ideas widely used in optimization for machine learning.
We develop a novel shuffling-based momentum technique.
Our tests have shown encouraging performance of the new algorithms.
- Score: 25.389545522794172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We combine two advanced ideas widely used in optimization for machine
learning: shuffling strategy and momentum technique to develop a novel
shuffling gradient-based method with momentum, coined Shuffling Momentum
Gradient (SMG), for non-convex finite-sum optimization problems. While our
method is inspired by momentum techniques, its update is fundamentally
different from existing momentum-based methods. We establish state-of-the-art
convergence rates of SMG for any shuffling strategy using either constant or
diminishing learning rate under standard assumptions (i.e.$L$-smoothness and
bounded variance). When the shuffling strategy is fixed, we develop another new
algorithm that is similar to existing momentum methods, and prove the same
convergence rates for this algorithm under the $L$-smoothness and bounded
gradient assumptions. We demonstrate our algorithms via numerical simulations
on standard datasets and compare them with existing shuffling methods. Our
tests have shown encouraging performance of the new algorithms.
Related papers
- Shuffling Momentum Gradient Algorithm for Convex Optimization [22.58278411628813]
TheTranart Gradient Descent method (SGD) and its variants have become methods choice for solving finite-sum optimization problems from machine learning and data science.
We provide the first analysis of shuffling momentum-based methods for the strongly setting.
arXiv Detail & Related papers (2024-03-05T18:19:02Z) - Modified Gauss-Newton Algorithms under Noise [2.0454959820861727]
modified Gauss-Newton or proxlinear algorithms can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings.
We explore the contrasting performance of these two classes of algorithms in theory on a stylized statistical example, and experimentally on learning problems including structured prediction.
arXiv Detail & Related papers (2023-05-18T01:10:42Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Formal guarantees for heuristic optimization algorithms used in machine
learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems.
We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z) - Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques.
VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - Differentially Private Accelerated Optimization Algorithms [0.7874708385247353]
We present two classes of differentially private optimization algorithms.
The first algorithm is inspired by Polyak's heavy ball method.
The second class of algorithms are based on Nesterov's accelerated gradient method.
arXiv Detail & Related papers (2020-08-05T08:23:01Z) - Momentum-Based Policy Gradient Methods [133.53164856723782]
We propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning.
In particular, we present a non-adaptive version of IS-MBPG method, which also reaches the best known sample complexity of $O(epsilon-3)$ without any large batches.
arXiv Detail & Related papers (2020-07-13T20:44:15Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.