On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration
- URL: http://arxiv.org/abs/2004.04719v1
- Date: Thu, 9 Apr 2020 17:54:18 GMT
- Title: On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration
- Authors: Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett,
Michael I. Jordan
- Abstract summary: We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
- Score: 115.1954841020189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We undertake a precise study of the asymptotic and non-asymptotic properties
of stochastic approximation procedures with Polyak-Ruppert averaging for
solving a linear system $\bar{A} \theta = \bar{b}$. When the matrix $\bar{A}$
is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates
with fixed step size and number of iterations going to infinity. The CLT
characterizes the exact asymptotic covariance matrix, which is the sum of the
classical Polyak-Ruppert covariance and a correction term that scales with the
step size. Under assumptions on the tail of the noise distribution, we prove a
non-asymptotic concentration inequality whose main term matches the covariance
in CLT in any direction, up to universal constants. When the matrix $\bar{A}$
is not Hurwitz but only has non-negative real parts in its eigenvalues, we
prove that the averaged LSA procedure actually achieves an $O(1/T)$ rate in
mean-squared error. Our results provide a more refined understanding of linear
stochastic approximation in both the asymptotic and non-asymptotic settings. We
also show various applications of the main results, including the study of
momentum-based stochastic gradient methods as well as temporal difference
algorithms in reinforcement learning.
Related papers
- Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using a descent gradient (SGD) algorithm with a constant step size.
We provide an expansion of the mean-squared error of the resulting estimator with respect to the number iterations of $n$.
We establish that this chain is geometrically ergodic with respect to a defined weighted Wasserstein semimetric.
arXiv Detail & Related papers (2024-10-07T15:02:48Z) - Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients [0.8749675983608172]
We present an unbiased method for posterior means based on kinetic Langevin dynamics.
Our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem.
Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the gold-standard" randomized Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2023-11-08T21:19:52Z) - The Curse of Memory in Stochastic Approximation: Extended Version [1.534667887016089]
Theory and application of approximation (SA) has grown within the control systems community since the earliest days of adaptive control.
Recent results establish remarkable performance of SA with (sufficiently small) constant step-size $alpha>0$.
arXiv Detail & Related papers (2023-09-06T12:22:32Z) - Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Stationary Behavior of Constant Stepsize SGD Type Algorithms: An
Asymptotic Characterization [4.932130498861987]
We study the behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.
We show that the limiting scaled stationary distribution is a solution of an integral equation.
arXiv Detail & Related papers (2021-11-11T17:39:50Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.