Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning
- URL: http://arxiv.org/abs/2405.16644v2
- Date: Sun, 02 Feb 2025 12:01:52 GMT
- Title: Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning
- Authors: Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey Naumov,
- Abstract summary: We prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstraps.
We illustrate our findings in the setting of temporal difference learning with linear function approximation.
- Score: 15.041074872715752
- License:
- Abstract: In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation.
First, we derive a novel high-dimensional probability convergence guarantee that depends explicitly on the variance and holds under weak conditions.
We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation [23.89036529638614]
We study time-uniform statistical inference for parameters in approximation (SA)
We analyze the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems.
arXiv Detail & Related papers (2024-10-19T10:27:26Z) - Effectiveness of Constant Stepsize in Markovian LSA and Statistical
Inference [9.689344942945652]
We study the effectiveness of using a constant stepsize in statistical inference via linear approximation (LSA) algorithms with Markovian data.
Our results show that using a constant stepsize enjoys easy hyper parameter tuning, fast convergence, and consistently better CI coverage, especially when data is limited.
arXiv Detail & Related papers (2023-12-18T02:51:57Z) - Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability [17.771354881467435]
We show that a simple algorithm with a universal and instance-independent step size is sufficient to obtain near-optimal variance and bias terms.
Our proof technique is based on refined error bounds for linear approximation together with the novel stability result for the product of random matrices.
arXiv Detail & Related papers (2023-10-22T12:37:25Z) - Variational sparse inverse Cholesky approximation for latent Gaussian
processes via double Kullback-Leibler minimization [6.012173616364571]
We combine a variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior.
For this setting, our variational approximation can be computed via gradient descent in polylogarithmic time per iteration.
We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate for stationary kernels than alternative approaches.
arXiv Detail & Related papers (2023-01-30T21:50:08Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex
Envelopes [40.31139355952393]
We construct a smooth Lyapunov function using the generalized envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function.
In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning.
We also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning.
arXiv Detail & Related papers (2020-02-03T16:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.