Variance Reduction on Adaptive Stochastic Mirror Descent
- URL: http://arxiv.org/abs/2012.13760v2
- Date: Mon, 19 Apr 2021 17:19:41 GMT
- Title: Variance Reduction on Adaptive Stochastic Mirror Descent
- Authors: Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng
- Abstract summary: We prove that variance reduction reduces SFO complexity of most adaptive mirror descent algorithms and accelerates their convergence.
We check the validity of our claims using experiments in deep learning.
- Score: 23.451652399157002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the idea of variance reduction applied to adaptive
stochastic mirror descent algorithms in the nonsmooth nonconvex finite-sum
optimization problems. We propose a simple yet generalized adaptive mirror
descent algorithm with variance reduction named SVRAMD and provide its
convergence analysis in different settings. We prove that variance reduction
reduces the SFO complexity of most adaptive mirror descent algorithms and
accelerates their convergence. In particular, our general theory implies that
variance reduction can be applied to algorithms using time-varying step sizes
and self-adaptive algorithms such as AdaGrad and RMSProp. Moreover, the
convergence rates of SVRAMD recover the best existing rates of non-adaptive
variance reduced mirror descent algorithms. We check the validity of our claims
using experiments in deep learning.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Optimal Guarantees for Algorithmic Reproducibility and Gradient
Complexity in Convex Optimization [55.115992622028685]
Previous work suggests that first-order methods would need to trade-off convergence rate (gradient convergence rate) for better.
We demonstrate that both optimal complexity and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems.
arXiv Detail & Related papers (2023-10-26T19:56:52Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Computational Complexity of Sub-linear Convergent Algorithms [0.0]
We will explore how starting with a small sample and then geometrically increasing it, and using the solution of the previous sample to compute the new ERM.
This will solve problems with first-order optimization algorithms of sublinear convergence but with lower computational complexity.
arXiv Detail & Related papers (2022-09-29T05:38:06Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - Bregman divergence based em algorithm and its application to classical
and quantum rate distortion theory [61.12008553173672]
We address the minimization problem of the Bregman divergence between an exponential subfamily and a mixture subfamily in a Bregman divergence system.
We apply this algorithm to rate distortion and its variants including the quantum setting.
arXiv Detail & Related papers (2022-01-07T13:33:28Z) - Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
via the Mirror Stochastic Polyak Stepsize [20.376216873620763]
We investigate the convergence of mirror descent (SMD) under in relatively smooth and smooth convex optimization.
We propose a new adaptive stepsize scheme -- the mirror Polyak stepsize (mSPS)
arXiv Detail & Related papers (2021-10-28T19:49:40Z) - Adaptive Importance Sampling for Finite-Sum Optimization and Sampling
with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes.
Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - The Hessian Estimation Evolution Strategy [3.756550107432323]
We present a novel black box optimization algorithm called Hessian Estimation Evolution Strategy.
The algorithm updates the covariance matrix of its sampling distribution by directly estimating the curvature of the objective function.
arXiv Detail & Related papers (2020-03-30T08:01:16Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.