META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for
Unbounded Functions
- URL: http://arxiv.org/abs/2209.14853v1
- Date: Thu, 29 Sep 2022 15:12:54 GMT
- Title: META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for
Unbounded Functions
- Authors: Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy L. Nguyen
- Abstract summary: Recent work overcomes the effect of having to compute gradients of "megabatches"
Work is widely used after update with competitive deep learning tasks.
- Score: 23.746620619512573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the application of variance reduction (VR) techniques to general
non-convex stochastic optimization problems. In this setting, the recent work
STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute
gradients of "mega-batches" that earlier VR methods rely on. There, STORM
utilizes recursive momentum to achieve the VR effect and is then later made
fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the
requirement for obtaining certain problem-specific parameters such as the
smoothness of the objective and bounds on the variance and norm of the
stochastic gradients in order to set the step size. However, STORM+ crucially
relies on the assumption that the function values are bounded, excluding a
large class of useful functions. In this work, we propose META-STORM, a
generalized framework of STORM+ that removes this bounded function values
assumption while still attaining the optimal convergence rate for non-convex
optimization. META-STORM not only maintains full-adaptivity, removing the need
to obtain problem specific parameters, but also improves the convergence rate's
dependency on the problem parameters. Furthermore, META-STORM can utilize a
large range of parameter settings that subsumes previous methods allowing for
more flexibility in a wider range of settings. Finally, we demonstrate the
effectiveness of META-STORM through experiments across common deep learning
tasks. Our algorithm improves upon the previous work STORM+ and is competitive
with widely used algorithms after the addition of per-coordinate update and
exponential moving average heuristics.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Multi-fidelity Constrained Optimization for Stochastic Black Box
Simulators [1.6385815610837167]
We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier.
Scout-Nd efficiently estimates the gradient, reduces the noise of the estimator gradient, and applies multi-fidelity schemes to further reduce computational effort.
We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods.
arXiv Detail & Related papers (2023-11-25T23:36:38Z) - Smoothing Policy Iteration for Zero-sum Markov Games [9.158672246275348]
We propose the smoothing policy robustness (SPI) algorithm to solve the zero-sum MGs approximately.
Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.
We also propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations.
arXiv Detail & Related papers (2022-12-03T14:39:06Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.