Policy evaluation from a single path: Multi-step methods, mixing and
mis-specification
- URL: http://arxiv.org/abs/2211.03899v1
- Date: Mon, 7 Nov 2022 23:15:25 GMT
- Title: Policy evaluation from a single path: Multi-step methods, mixing and
mis-specification
- Authors: Yaqi Duan, Martin J. Wainwright
- Abstract summary: We study non-parametric estimation of the value function of an infinite-horizon $gamma$-discounted Markov reward process.
We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference estimates.
- Score: 45.88067550131531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study non-parametric estimation of the value function of an
infinite-horizon $\gamma$-discounted Markov reward process (MRP) using
observations from a single trajectory. We provide non-asymptotic guarantees for
a general family of kernel-based multi-step temporal difference (TD) estimates,
including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the
TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds
capture its dependence on Bellman fluctuations, mixing time of the Markov
chain, any mis-specification in the model, as well as the choice of weight
function defining the estimator itself, and reveal some delicate interactions
between mixing time and model mis-specification. For a given TD method applied
to a well-specified model, its statistical error under trajectory data is
similar to that of i.i.d. sample transition pairs, whereas under
mis-specification, temporal dependence in data inflates the statistical error.
However, any such deterioration can be mitigated by increased look-ahead. We
complement our upper bounds by proving minimax lower bounds that establish
optimality of TD-based methods with appropriately chosen look-ahead and
weighting, and reveal some fundamental differences between value function
estimation and ordinary non-parametric regression.
Related papers
- Statistical guarantees for continuous-time policy evaluation: blessing of ellipticity and new tradeoffs [2.926192989090622]
We study the estimation of the value function for continuous-time Markov diffusion processes.
Our work provides non-asymptotic statistical guarantees for the least-squares temporal-difference method.
arXiv Detail & Related papers (2025-02-06T18:39:03Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces [24.03281329962804]
In this paper, we focus on the non-asymptotic statistical rates of distributional temporal difference learning.
We show that for NTD with a generative model, we need $tildeO(varepsilon-2 mu_pi,min-1 (1-gamma)-3+t_mixmu_pi,min-1 (1-gamma)-1)$ sample complexity bounds in the case of the $1$-Wasserstein distance.
We establish a novel Freedman's inequality
arXiv Detail & Related papers (2024-03-09T06:19:53Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Improved Prediction and Network Estimation Using the Monotone Single
Index Multi-variate Autoregressive Model [34.529641317832024]
We develop a semi-parametric approach based on the monotone single-index multi-variate autoregressive model (SIMAM)
We provide theoretical guarantees for dependent data and an alternating projected gradient descent algorithm.
We demonstrate the superior performance both on simulated data and two real data examples.
arXiv Detail & Related papers (2021-06-28T12:32:29Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.