The Curse of Memory in Stochastic Approximation: Extended Version
- URL: http://arxiv.org/abs/2309.02944v2
- Date: Sun, 17 Sep 2023 17:16:32 GMT
- Title: The Curse of Memory in Stochastic Approximation: Extended Version
- Authors: Caio Kalil Lauand and Sean Meyn
- Abstract summary: Theory and application of approximation (SA) has grown within the control systems community since the earliest days of adaptive control.
Recent results establish remarkable performance of SA with (sufficiently small) constant step-size $alpha>0$.
- Score: 1.534667887016089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Theory and application of stochastic approximation (SA) has grown within the
control systems community since the earliest days of adaptive control. This
paper takes a new look at the topic, motivated by recent results establishing
remarkable performance of SA with (sufficiently small) constant step-size
$\alpha>0$. If averaging is implemented to obtain the final parameter estimate,
then the estimates are asymptotically unbiased with nearly optimal asymptotic
covariance. These results have been obtained for random linear SA recursions
with i.i.d. coefficients. This paper obtains very different conclusions in the
more common case of geometrically ergodic Markovian disturbance: (i) The
$\textit{target bias}$ is identified, even in the case of non-linear SA, and is
in general non-zero. The remaining results are established for linear SA
recursions: (ii) the bivariate parameter-disturbance process is geometrically
ergodic in a topological sense; (iii) the representation for bias has a simpler
form in this case, and cannot be expected to be zero if there is multiplicative
noise; (iv) the asymptotic covariance of the averaged parameters is within
$O(\alpha)$ of optimal. The error term is identified, and may be massive if
mean dynamics are not well conditioned. The theory is illustrated with
application to TD-learning.
Related papers
- Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness [57.93371273485736]
We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $ell$-smoothness condition $||nabla2f(x)|| le ellleft(||nabla f(x)||right),$ which generalizes the $L$-smoothness and $(L_0,L_1)$-smoothness.
arXiv Detail & Related papers (2025-08-09T08:28:06Z) - Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation.<n>We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z) - Online Covariance Estimation in Nonsmooth Stochastic Approximation [14.818683408659764]
We consider applying approximation (SA) methods to solve nonsmooth variational inclusion problems.
Our convergence construction establish the best-known for statistical estimation methods.
arXiv Detail & Related papers (2025-02-07T20:16:51Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
It is shown for the first time that this assumption is not required for convergence and finer results.<n>Rates of convergence are obtained for the standard algorithm and for estimates obtained via the averaging technique of Polyak and Ruppert.<n>Results from numerical experiments illustrate that $beta_theta$ may be large due to a combination of multiplicative noise and Markovian memory.
arXiv Detail & Related papers (2024-05-28T05:11:05Z) - Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting.
We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin.
Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization [21.334985032433778]
Decentralized optimization problem $min_bf xinmathbb Rd f(bf x)triq frac1msum_i=1m f_i(bf x)triq frac1nsum_j=1n.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Off-policy estimation of linear functionals: Non-asymptotic theory for
semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
We prove non-asymptotic upper bounds on the mean-squared error of such procedures.
We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Computational Complexity of Normalizing Constants for the Product of
Determinantal Point Processes [12.640283469603357]
We study the computational complexity of computing the normalizing constant.
We show that $sum_Sdet(bf A_S,S)p$ exactly for every (fixed) positive even integer $p$ is UP-hard and Mod$_3$P-hard.
arXiv Detail & Related papers (2021-11-28T14:08:25Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Accelerating Optimization and Reinforcement Learning with
Quasi-Stochastic Approximation [2.294014185517203]
This paper sets out to extend convergence theory to quasi-stochastic approximations.
It is illustrated with applications to gradient-free optimization and policy gradient algorithms for reinforcement learning.
arXiv Detail & Related papers (2020-09-30T04:44:45Z) - Convergence Analysis of Riemannian Stochastic Approximation Schemes [39.32179384256228]
This paper analyzes a class of correlated approximation (SA) schemes to tackle optimization problems.
We show that the conditions we derive are considerably milder than previous works.
Third, we consider the case where the mean-field function can only be estimated up to a small bias, and/or the case in which the samples are drawn from a chain.
arXiv Detail & Related papers (2020-05-27T11:24:58Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.