Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
via the Mirror Stochastic Polyak Stepsize
- URL: http://arxiv.org/abs/2110.15412v3
- Date: Wed, 24 May 2023 21:02:59 GMT
- Title: Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
via the Mirror Stochastic Polyak Stepsize
- Authors: Ryan D'Orazio, Nicolas Loizou, Issam Laradji, Ioannis Mitliagkas
- Abstract summary: We investigate the convergence of mirror descent (SMD) under in relatively smooth and smooth convex optimization.
We propose a new adaptive stepsize scheme -- the mirror Polyak stepsize (mSPS)
- Score: 20.376216873620763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the convergence of stochastic mirror descent (SMD) under
interpolation in relatively smooth and smooth convex optimization. In
relatively smooth convex optimization we provide new convergence guarantees for
SMD with a constant stepsize. For smooth convex optimization we propose a new
adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS).
Notably, our convergence results in both settings do not make bounded gradient
assumptions or bounded variance assumptions, and we show convergence to a
neighborhood that vanishes under interpolation. Consequently, these results
correspond to the first convergence guarantees under interpolation for the
exponentiated gradient algorithm for fixed or adaptive stepsizes. mSPS
generalizes the recently proposed stochastic Polyak stepsize (SPS) (Loizou et
al. 2021) to mirror descent and remains both practical and efficient for modern
machine learning applications while inheriting the benefits of mirror descent.
We complement our results with experiments across various supervised learning
tasks and different instances of SMD, demonstrating the effectiveness of mSPS.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Regularized Projection Matrix Approximation with Applications to Community Detection [1.3761665705201904]
This paper introduces a regularized projection matrix approximation framework designed to recover cluster information from the affinity matrix.
We investigate three distinct penalty functions, each specifically tailored to address bounded, positive, and sparse scenarios.
Numerical experiments conducted on both synthetic and real-world datasets reveal that our regularized projection matrix approximation approach significantly outperforms state-of-the-art methods in clustering performance.
arXiv Detail & Related papers (2024-05-26T15:18:22Z) - Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence
and Variance Reduction [26.9632099249085]
We propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings.
We equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $smashwidetildemathcalO(n+1/epsilon)$ gradient evaluations.
arXiv Detail & Related papers (2023-08-11T10:17:29Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - Mirror Descent with Relative Smoothness in Measure Spaces, with
application to Sinkhorn and EM [11.007661197604065]
This paper studies the convergence of the mirror descent algorithm in an infinite-dimensional setting.
Applying our result to joint distributions and the Kullback--Leibler divergence, we show that Sinkhorn's primal iterations for optimal transport correspond to a mirror descent.
arXiv Detail & Related papers (2022-06-17T16:19:47Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Variance Reduction on Adaptive Stochastic Mirror Descent [23.451652399157002]
We prove that variance reduction reduces SFO complexity of most adaptive mirror descent algorithms and accelerates their convergence.
We check the validity of our claims using experiments in deep learning.
arXiv Detail & Related papers (2020-12-26T15:15:51Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.