$\mu^2$-SGD: Stable Stochastic Optimization via a Double Momentum
Mechanism
- URL: http://arxiv.org/abs/2304.04172v1
- Date: Sun, 9 Apr 2023 06:18:34 GMT
- Title: $\mu^2$-SGD: Stable Stochastic Optimization via a Double Momentum
Mechanism
- Authors: Kfir Y. Levy
- Abstract summary: We consider convex optimization problems where the objective is an expectation over smooth functions.
We suggest a novel gradient estimate that combines two recent mechanism that are related to notion of momentum.
We show that these approaches obtain the optimal convergence rates for both noiseless and noisy case.
- Score: 12.709177728330399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider stochastic convex optimization problems where the objective is an
expectation over smooth functions. For this setting we suggest a novel gradient
estimate that combines two recent mechanism that are related to notion of
momentum. Then, we design an SGD-style algorithm as well as an accelerated
version that make use of this new estimator, and demonstrate the robustness of
these new approaches to the choice of the learning rate. Concretely, we show
that these approaches obtain the optimal convergence rates for both noiseless
and noisy case with the same choice of fixed learning rate. Moreover, for the
noisy case we show that these approaches achieve the same optimal bound for a
very wide range of learning rates.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Rapid quantum approaches for combinatorial optimisation inspired by
optimal state-transfer [3.591122855617648]
We propose a new design to tackle optimisation problems, inspired by Hamiltonians for optimal state-transfer.
We provide numerical evidence of the success of this new design.
arXiv Detail & Related papers (2023-01-17T12:45:09Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Minibatch and Momentum Model-based Methods for Stochastic Non-smooth
Non-convex Optimization [3.4809730725241597]
We make two important extensions to model-based methods.
First, we propose a new minibatch which takes a set of samples to approximate the model function in each iteration.
Second, by the success of momentum techniques we propose a new convex-based model.
arXiv Detail & Related papers (2021-06-06T05:31:57Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.