Tight High Probability Bounds for Linear Stochastic Approximation with
Fixed Stepsize
- URL: http://arxiv.org/abs/2106.01257v1
- Date: Wed, 2 Jun 2021 16:10:37 GMT
- Title: Tight High Probability Bounds for Linear Stochastic Approximation with
Fixed Stepsize
- Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Kevin
Scaman, Hoi-To Wai
- Abstract summary: This paper provides a non-asymptotic analysis of linear approximation (LSA) algorithms with fixed stepsize.
We derive high probability bounds on the performance of LSA under weaker conditions on the sequence $(bf A_n, bf b_n): n in mathbbN*$.
We show that our conclusions cannot be improved without additional assumptions on the sequence $bf A_n: n in mathbbN*$.
- Score: 41.38162218102825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper provides a non-asymptotic analysis of linear stochastic
approximation (LSA) algorithms with fixed stepsize. This family of methods
arises in many machine learning tasks and is used to obtain approximate
solutions of a linear system $\bar{A}\theta = \bar{b}$ for which $\bar{A}$ and
$\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf
b}_n): n \in \mathbb{N}^*\}$. Our analysis is based on new results regarding
moments and high probability bounds for products of matrices which are shown to
be tight. We derive high probability bounds on the performance of LSA under
weaker conditions on the sequence $\{({\bf A}_n, {\bf b}_n): n \in
\mathbb{N}^*\}$ than previous works. However, in contrast, we establish
polynomial concentration bounds with order depending on the stepsize. We show
that our conclusions cannot be improved without additional assumptions on the
sequence of random matrices $\{{\bf A}_n: n \in \mathbb{N}^*\}$, and in
particular that no Gaussian or exponential high probability bounds can hold.
Finally, we pay a particular attention to establishing bounds with sharp order
with respect to the number of iterations and the stepsize and whose leading
terms contain the covariance matrices appearing in the central limit theorems.
Related papers
- Convergence of Alternating Gradient Descent for Matrix Factorization [5.439020425819001]
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective.
We show that, for a rank-$r$ matrix $mathbfA in mathbbRm times n$, smoothness $C$ in the complexity $T$ to be an absolute constant.
arXiv Detail & Related papers (2023-05-11T16:07:47Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates
of Linear Stochastic Approximation [22.51165277694864]
This paper provides a finite-time analysis of linear approximation (LSA) algorithms with fixed step size.
LSA is used to compute approximate solutions of a $d$-dimensional linear system.
arXiv Detail & Related papers (2022-07-10T14:36:04Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.