On the fast convergence of minibatch heavy ball momentum
- URL: http://arxiv.org/abs/2206.07553v4
- Date: Wed, 13 Dec 2023 01:37:55 GMT
- Title: On the fast convergence of minibatch heavy ball momentum
- Authors: Raghu Bollapragada, Tyler Chen, Rachel Ward
- Abstract summary: We show that heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on optimization problems.
The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum.
- Score: 6.154018226934517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simple stochastic momentum methods are widely used in machine learning
optimization, but their good practical performance is at odds with an absence
of theoretical guarantees of acceleration in the literature. In this work, we
aim to close the gap between theory and practice by showing that stochastic
heavy ball momentum retains the fast linear rate of (deterministic) heavy ball
momentum on quadratic optimization problems, at least when minibatching with a
sufficiently large batch size. The algorithm we study can be interpreted as an
accelerated randomized Kaczmarz algorithm with minibatching and heavy ball
momentum. The analysis relies on carefully decomposing the momentum transition
matrix, and using new spectral norm concentration bounds for products of
independent random matrices. We provide numerical illustrations demonstrating
that our bounds are reasonably sharp.
Related papers
- Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise [16.12834917344859]
It is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings.
We show that heavy-ball momentum can provide $tildemathcalO(sqrtkappa)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate.
This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning.
arXiv Detail & Related papers (2023-12-22T09:58:39Z) - The Marginal Value of Momentum for Small Learning Rate SGD [20.606430391298815]
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without gradient noise regimes.
Experiments show that momentum indeed has limited benefits for both optimization and generalization in practical training where the optimal learning rate is not very large.
arXiv Detail & Related papers (2023-07-27T21:01:26Z) - Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations.
The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash
Equilibrium [62.51015395213579]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
The proposed algorithm employs the movements of particles to represent the updates of random strategies for the $ilon$-mixed Nash equilibrium.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - Last-iterate convergence analysis of stochastic momentum methods for
neural networks [3.57214198937538]
The momentum method is used to solve large-scale optimization problems in neural networks.
Current convergence results of momentum methods under artificial settings.
The momentum factors can be fixed to be constant, rather than in existing time.
arXiv Detail & Related papers (2022-05-30T02:17:44Z) - Momentum Doesn't Change the Implicit Bias [36.301490759243876]
We analyze the implicit bias of momentum-based optimization.
We construct new Lyapunov functions as a tool to analyze the gap between the model parameter and the max-margin solution.
arXiv Detail & Related papers (2021-10-08T04:37:18Z) - Accelerate Distributed Stochastic Descent for Nonconvex Optimization
with Momentum [12.324457683544132]
We propose a momentum method for such model averaging approaches.
We analyze the convergence and scaling properties of such momentum methods.
Our experimental results show that block momentum not only accelerates training, but also achieves better results.
arXiv Detail & Related papers (2021-10-01T19:23:18Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Fast and differentiable simulation of driven quantum systems [58.720142291102135]
We introduce a semi-analytic method based on the Dyson expansion that allows us to time-evolve driven quantum systems much faster than standard numerical methods.
We show results of the optimization of a two-qubit gate using transmon qubits in the circuit QED architecture.
arXiv Detail & Related papers (2020-12-16T21:43:38Z) - Fast Gravitational Approach for Rigid Point Set Registration with
Ordinary Differential Equations [79.71184760864507]
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA)
In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field.
We show that the new method class has characteristics not found in previous alignment methods.
arXiv Detail & Related papers (2020-09-28T15:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.