Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way
- URL: http://arxiv.org/abs/2410.13067v1
- Date: Wed, 16 Oct 2024 21:49:27 GMT
- Title: Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way
- Authors: Jeongyeol Kwon, Luke Dotson, Yudong Chen, Qiaomin Xie,
- Abstract summary: We investigate it constant stpesize schemes through the lens of Markov processes.
We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes.
- Score: 12.331596909999764
- License:
- Abstract: Previous studies on two-timescale stochastic approximation (SA) mainly focused on bounding mean-squared errors under diminishing stepsize schemes. In this work, we investigate {\it constant} stpesize schemes through the lens of Markov processes, proving that the iterates of both timescales converge to a unique joint stationary distribution in Wasserstein metric. We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes in the presence of Markovian noise. Specifically, with two constant stepsizes $\alpha < \beta$, we show that the biases scale linearly with both stepsizes as $\Theta(\alpha)+\Theta(\beta)$ up to higher-order terms, while the variance of the slower iterate (resp., faster iterate) scales only with its own stepsize as $O(\alpha)$ (resp., $O(\beta)$). Unlike previous work, our results require no additional assumptions such as $\beta^2 \ll \alpha$ nor extra dependence on dimensions. These fine-grained characterizations allow tail-averaging and extrapolation techniques to reduce variance and bias, improving mean-squared error bound to $O(\beta^4 + \frac{1}{t})$ for both iterates.
Related papers
- Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using a descent gradient (SGD) algorithm with a constant step size.
We provide an expansion of the mean-squared error of the resulting estimator with respect to the number iterations of $n$.
We establish that this chain is geometrically ergodic with respect to a defined weighted Wasserstein semimetric.
arXiv Detail & Related papers (2024-10-07T15:02:48Z) - Bias and Extrapolation in Markovian Linear Stochastic Approximation with
Constant Stepsizes [9.689344942945652]
We consider Linear Approximation (LSA) with a constant stepsize and Markovian data.
We show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize.
We show that the bias can be reduced using Richardson-Romberg extrapolation with $mge 2$ stepsizes.
arXiv Detail & Related papers (2022-10-03T14:11:03Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Almost sure convergence rates for Stochastic Gradient Descent and
Stochastic Heavy Ball [17.33867778750777]
We study gradient descent (SGD) and the heavy ball method (SHB) for the general approximation problem.
For SGD, in the convex and smooth setting, we provide the first emphalmost sure convergence emphrates for a weighted average of the iterates.
arXiv Detail & Related papers (2020-06-14T11:12:05Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z) - Finite Time Analysis of Linear Two-timescale Stochastic Approximation
with Markovian Noise [28.891930079358954]
We provide a finite-time analysis for linear two timescale SA scheme.
Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise.
We present an expansion of the expected error with a matching lower bound.
arXiv Detail & Related papers (2020-02-04T13:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.