$\mu^2$-SGD: Stable Stochastic Optimization via a Double Momentum
Mechanism
- URL: http://arxiv.org/abs/2304.04172v1
- Date: Sun, 9 Apr 2023 06:18:34 GMT
- Title: $\mu^2$-SGD: Stable Stochastic Optimization via a Double Momentum
Mechanism
- Authors: Kfir Y. Levy
- Abstract summary: We consider convex optimization problems where the objective is an expectation over smooth functions.
We suggest a novel gradient estimate that combines two recent mechanism that are related to notion of momentum.
We show that these approaches obtain the optimal convergence rates for both noiseless and noisy case.
- Score: 12.709177728330399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider stochastic convex optimization problems where the objective is an
expectation over smooth functions. For this setting we suggest a novel gradient
estimate that combines two recent mechanism that are related to notion of
momentum. Then, we design an SGD-style algorithm as well as an accelerated
version that make use of this new estimator, and demonstrate the robustness of
these new approaches to the choice of the learning rate. Concretely, we show
that these approaches obtain the optimal convergence rates for both noiseless
and noisy case with the same choice of fixed learning rate. Moreover, for the
noisy case we show that these approaches achieve the same optimal bound for a
very wide range of learning rates.
Related papers
- Rapid quantum approaches for combinatorial optimisation inspired by
optimal state-transfer [3.591122855617648]
We propose a new design to tackle optimisation problems, inspired by Hamiltonians for optimal state-transfer.
We provide numerical evidence of the success of this new design.
arXiv Detail & Related papers (2023-01-17T12:45:09Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Minibatch and Momentum Model-based Methods for Stochastic Non-smooth
Non-convex Optimization [3.4809730725241597]
We make two important extensions to model-based methods.
First, we propose a new minibatch which takes a set of samples to approximate the model function in each iteration.
Second, by the success of momentum techniques we propose a new convex-based model.
arXiv Detail & Related papers (2021-06-06T05:31:57Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.