Statistical guarantees for continuous-time policy evaluation: blessing of ellipticity and new tradeoffs
- URL: http://arxiv.org/abs/2502.04297v1
- Date: Thu, 06 Feb 2025 18:39:03 GMT
- Title: Statistical guarantees for continuous-time policy evaluation: blessing of ellipticity and new tradeoffs
- Authors: Wenlong Mou,
- Abstract summary: We study the estimation of the value function for continuous-time Markov diffusion processes.
Our work provides non-asymptotic statistical guarantees for the least-squares temporal-difference method.
- Score: 2.926192989090622
- License:
- Abstract: We study the estimation of the value function for continuous-time Markov diffusion processes using a single, discretely observed ergodic trajectory. Our work provides non-asymptotic statistical guarantees for the least-squares temporal-difference (LSTD) method, with performance measured in the first-order Sobolev norm. Specifically, the estimator attains an $O(1 / \sqrt{T})$ convergence rate when using a trajectory of length $T$; notably, this rate is achieved as long as $T$ scales nearly linearly with both the mixing time of the diffusion and the number of basis functions employed. A key insight of our approach is that the ellipticity inherent in the diffusion process ensures robust performance even as the effective horizon diverges to infinity. Moreover, we demonstrate that the Markovian component of the statistical error can be controlled by the approximation error, while the martingale component grows at a slower rate relative to the number of basis functions. By carefully balancing these two sources of error, our analysis reveals novel trade-offs between approximation and statistical errors.
Related papers
- Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation.
First, we derive a novel high-dimensional probability convergence guarantee that depends explicitly on the variance and holds under weak conditions.
We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - A Stability Principle for Learning under Non-Stationarity [1.1510009152620668]
We develop a versatile framework for statistical learning in non-stationary environments.
At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.
arXiv Detail & Related papers (2023-10-27T17:53:53Z) - Online Statistical Inference for Nonlinear Stochastic Approximation with
Markovian Data [22.59079286063505]
We study the statistical inference of nonlinear approximation algorithms utilizing a single trajectory of Markovian data.
Our methodology has practical applications in various scenarios, such as Gradient Descent (SGD) on autoregressive data and asynchronous Q-Learning.
arXiv Detail & Related papers (2023-02-15T14:31:11Z) - On the Statistical Benefits of Temporal Difference Learning [6.408072565019087]
Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions.
We show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates.
We prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states.
arXiv Detail & Related papers (2023-01-30T21:02:25Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Policy evaluation from a single path: Multi-step methods, mixing and
mis-specification [45.88067550131531]
We study non-parametric estimation of the value function of an infinite-horizon $gamma$-discounted Markov reward process.
We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference estimates.
arXiv Detail & Related papers (2022-11-07T23:15:25Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Learning Asynchronous and Error-prone Longitudinal Data via Functional
Calibration [4.446626375802735]
We propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on functional data with measurement error.
For regression with time-invariant coefficients, our estimator is root-n consistent, and root-n normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate.
The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation.
arXiv Detail & Related papers (2022-09-28T03:27:31Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.