Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
via the Mirror Stochastic Polyak Stepsize
- URL: http://arxiv.org/abs/2110.15412v3
- Date: Wed, 24 May 2023 21:02:59 GMT
- Title: Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
via the Mirror Stochastic Polyak Stepsize
- Authors: Ryan D'Orazio, Nicolas Loizou, Issam Laradji, Ioannis Mitliagkas
- Abstract summary: We investigate the convergence of mirror descent (SMD) under in relatively smooth and smooth convex optimization.
We propose a new adaptive stepsize scheme -- the mirror Polyak stepsize (mSPS)
- Score: 20.376216873620763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the convergence of stochastic mirror descent (SMD) under
interpolation in relatively smooth and smooth convex optimization. In
relatively smooth convex optimization we provide new convergence guarantees for
SMD with a constant stepsize. For smooth convex optimization we propose a new
adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS).
Notably, our convergence results in both settings do not make bounded gradient
assumptions or bounded variance assumptions, and we show convergence to a
neighborhood that vanishes under interpolation. Consequently, these results
correspond to the first convergence guarantees under interpolation for the
exponentiated gradient algorithm for fixed or adaptive stepsizes. mSPS
generalizes the recently proposed stochastic Polyak stepsize (SPS) (Loizou et
al. 2021) to mirror descent and remains both practical and efficient for modern
machine learning applications while inheriting the benefits of mirror descent.
We complement our results with experiments across various supervised learning
tasks and different instances of SMD, demonstrating the effectiveness of mSPS.
Related papers
- Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance [10.11126899274029]
We propose and explore new Polyak-type variants suitable for the update rule of the Heavy Ball method (SHB)
For MomSPS$_max$, we provide guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming)
The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-sizes for SHB that guarantee convergence to the exact minimizer without prior knowledge.
arXiv Detail & Related papers (2024-06-06T15:08:06Z) - Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence
and Variance Reduction [26.9632099249085]
We propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings.
We equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $smashwidetildemathcalO(n+1/epsilon)$ gradient evaluations.
arXiv Detail & Related papers (2023-08-11T10:17:29Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Variance Reduction on Adaptive Stochastic Mirror Descent [23.451652399157002]
We prove that variance reduction reduces SFO complexity of most adaptive mirror descent algorithms and accelerates their convergence.
We check the validity of our claims using experiments in deep learning.
arXiv Detail & Related papers (2020-12-26T15:15:51Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.