The Curse of Memory in Stochastic Approximation: Extended Version
- URL: http://arxiv.org/abs/2309.02944v2
- Date: Sun, 17 Sep 2023 17:16:32 GMT
- Title: The Curse of Memory in Stochastic Approximation: Extended Version
- Authors: Caio Kalil Lauand and Sean Meyn
- Abstract summary: Theory and application of approximation (SA) has grown within the control systems community since the earliest days of adaptive control.
Recent results establish remarkable performance of SA with (sufficiently small) constant step-size $alpha>0$.
- Score: 1.534667887016089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Theory and application of stochastic approximation (SA) has grown within the
control systems community since the earliest days of adaptive control. This
paper takes a new look at the topic, motivated by recent results establishing
remarkable performance of SA with (sufficiently small) constant step-size
$\alpha>0$. If averaging is implemented to obtain the final parameter estimate,
then the estimates are asymptotically unbiased with nearly optimal asymptotic
covariance. These results have been obtained for random linear SA recursions
with i.i.d. coefficients. This paper obtains very different conclusions in the
more common case of geometrically ergodic Markovian disturbance: (i) The
$\textit{target bias}$ is identified, even in the case of non-linear SA, and is
in general non-zero. The remaining results are established for linear SA
recursions: (ii) the bivariate parameter-disturbance process is geometrically
ergodic in a topological sense; (iii) the representation for bias has a simpler
form in this case, and cannot be expected to be zero if there is multiplicative
noise; (iv) the asymptotic covariance of the averaged parameters is within
$O(\alpha)$ of optimal. The error term is identified, and may be massive if
mean dynamics are not well conditioned. The theory is illustrated with
application to TD-learning.
Related papers
- Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Convergence Analysis of Riemannian Stochastic Approximation Schemes [39.32179384256228]
This paper analyzes a class of correlated approximation (SA) schemes to tackle optimization problems.
We show that the conditions we derive are considerably milder than previous works.
Third, we consider the case where the mean-field function can only be estimated up to a small bias, and/or the case in which the samples are drawn from a chain.
arXiv Detail & Related papers (2020-05-27T11:24:58Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.