Convergence Analysis of Riemannian Stochastic Approximation Schemes
- URL: http://arxiv.org/abs/2005.13284v3
- Date: Wed, 19 May 2021 14:25:42 GMT
- Title: Convergence Analysis of Riemannian Stochastic Approximation Schemes
- Authors: Alain Durmus, Pablo Jim\'enez, \'Eric Moulines, Salem Said, Hoi-To Wai
- Abstract summary: This paper analyzes a class of correlated approximation (SA) schemes to tackle optimization problems.
We show that the conditions we derive are considerably milder than previous works.
Third, we consider the case where the mean-field function can only be estimated up to a small bias, and/or the case in which the samples are drawn from a chain.
- Score: 39.32179384256228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper analyzes the convergence for a large class of Riemannian
stochastic approximation (SA) schemes, which aim at tackling stochastic
optimization problems. In particular, the recursions we study use either the
exponential map of the considered manifold (geodesic schemes) or more general
retraction functions (retraction schemes) used as a proxy for the exponential
map. Such approximations are of great interest since they are low complexity
alternatives to geodesic schemes. Under the assumption that the mean field of
the SA is correlated with the gradient of a smooth Lyapunov function (possibly
non-convex), we show that the above Riemannian SA schemes find an
${\mathcal{O}}(b_\infty + \log n / \sqrt{n})$-stationary point (in expectation)
within ${\mathcal{O}}(n)$ iterations, where $b_\infty \geq 0$ is the asymptotic
bias. Compared to previous works, the conditions we derive are considerably
milder. First, all our analysis are global as we do not assume iterates to be
a-priori bounded. Second, we study biased SA schemes. To be more specific, we
consider the case where the mean-field function can only be estimated up to a
small bias, and/or the case in which the samples are drawn from a controlled
Markov chain. Third, the conditions on retractions required to ensure
convergence of the related SA schemes are weak and hold for well-known
examples. We illustrate our results on three machine learning problems.
Related papers
- Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
We study the convergence properties of the Gradient Descent (SGD) method for finding a stationary point of an objective function $J(cdot)$.
Our results apply to a class of invex'' functions, which have the property that every stationary point is also a global minimizer.
arXiv Detail & Related papers (2023-12-05T15:22:39Z) - Demystifying the Myths and Legends of Nonconvex Convergence of SGD [17.445810977264067]
gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems.
As our analyses, we addressed certain myths and legends related to the non convergence of the gradient.
arXiv Detail & Related papers (2023-10-19T17:58:59Z) - The Curse of Memory in Stochastic Approximation: Extended Version [1.534667887016089]
Theory and application of approximation (SA) has grown within the control systems community since the earliest days of adaptive control.
Recent results establish remarkable performance of SA with (sufficiently small) constant step-size $alpha>0$.
arXiv Detail & Related papers (2023-09-06T12:22:32Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Randomized Coordinate Subgradient Method for Nonsmooth Composite
Optimization [11.017632675093628]
Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
arXiv Detail & Related papers (2022-06-30T02:17:11Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Generalization in Supervised Learning Through Riemannian Contraction [4.3604518673788135]
In a supervised learning setting, we show that if an metric 0 is contracting in someian rate $lambda, it is uniformly uniformly rate with $math.
The results hold for gradient and deterministic loss surfaces, in both continuous and stable $-time.
They can be shown to be optimal in certain linear settings, such as over Descent$ convex or strongly convex loss surfaces.
arXiv Detail & Related papers (2022-01-17T23:08:47Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.