Related papers: Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

URL: http://arxiv.org/abs/2211.03899v1
Date: Mon, 7 Nov 2022 23:15:25 GMT
Title: Policy evaluation from a single path: Multi-step methods, mixing and mis-specification
Authors: Yaqi Duan, Martin J. Wainwright
Abstract summary: We study non-parametric estimation of the value function of an infinite-horizon $gamma$-discounted Markov reward process. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference estimates.
Score: 45.88067550131531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds capture its dependence on Bellman fluctuations, mixing time of the Markov chain, any mis-specification in the model, as well as the choice of weight function defining the estimator itself, and reveal some delicate interactions between mixing time and model mis-specification. For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i.i.d. sample transition pairs, whereas under mis-specification, temporal dependence in data inflates the statistical error. However, any such deterioration can be mitigated by increased look-ahead. We complement our upper bounds by proving minimax lower bounds that establish optimality of TD-based methods with appropriately chosen look-ahead and weighting, and reveal some fundamental differences between value function estimation and ordinary non-parametric regression.

Related papers

Statistical guarantees for continuous-time policy evaluation: blessing of ellipticity and new tradeoffs [2.926192989090622]
We study the estimation of the value function for continuous-time Markov diffusion processes. Our work provides non-asymptotic statistical guarantees for the least-squares temporal-difference method.
arXiv Detail & Related papers (2025-02-06T18:39:03Z)
Markov Chain Variance Estimation: A Stochastic Approximation Approach [14.883782513177094]
We consider the problem of estimating the variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ at each step, does not require any historical samples or any prior knowledge of run-length, and has optimal $O(frac1n) rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees.
arXiv Detail & Related papers (2024-09-09T15:42:28Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces [24.03281329962804]
In this paper, we focus on the non-asymptotic statistical rates of distributional temporal difference learning. We show that for NTD with a generative model, we need $tildeO(varepsilon-2 mu_pi,min-1 (1-gamma)-3+t_mixmu_pi,min-1 (1-gamma)-1)$ sample complexity bounds in the case of the $1$-Wasserstein distance. We establish a novel Freedman's inequality
arXiv Detail & Related papers (2024-03-09T06:19:53Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation [1.575865518040625]
We study the off-policy evaluation problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem.
arXiv Detail & Related papers (2022-01-17T01:09:38Z)
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme. We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Improved Prediction and Network Estimation Using the Monotone Single Index Multi-variate Autoregressive Model [34.529641317832024]
We develop a semi-parametric approach based on the monotone single-index multi-variate autoregressive model (SIMAM) We provide theoretical guarantees for dependent data and an alternating projected gradient descent algorithm. We demonstrate the superior performance both on simulated data and two real data examples.
arXiv Detail & Related papers (2021-06-28T12:32:29Z)
Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model. We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.