Policy evaluation from a single path: Multi-step methods, mixing and
mis-specification
- URL: http://arxiv.org/abs/2211.03899v1
- Date: Mon, 7 Nov 2022 23:15:25 GMT
- Title: Policy evaluation from a single path: Multi-step methods, mixing and
mis-specification
- Authors: Yaqi Duan, Martin J. Wainwright
- Abstract summary: We study non-parametric estimation of the value function of an infinite-horizon $gamma$-discounted Markov reward process.
We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference estimates.
- Score: 45.88067550131531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study non-parametric estimation of the value function of an
infinite-horizon $\gamma$-discounted Markov reward process (MRP) using
observations from a single trajectory. We provide non-asymptotic guarantees for
a general family of kernel-based multi-step temporal difference (TD) estimates,
including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the
TD$(\lambda)$ family for $\lambda \in [0,1)$ as special cases. Our bounds
capture its dependence on Bellman fluctuations, mixing time of the Markov
chain, any mis-specification in the model, as well as the choice of weight
function defining the estimator itself, and reveal some delicate interactions
between mixing time and model mis-specification. For a given TD method applied
to a well-specified model, its statistical error under trajectory data is
similar to that of i.i.d. sample transition pairs, whereas under
mis-specification, temporal dependence in data inflates the statistical error.
However, any such deterioration can be mitigated by increased look-ahead. We
complement our upper bounds by proving minimax lower bounds that establish
optimality of TD-based methods with appropriately chosen look-ahead and
weighting, and reveal some fundamental differences between value function
estimation and ordinary non-parametric regression.
Related papers
- Markov Chain Variance Estimation: A Stochastic Approximation Approach [14.883782513177094]
We consider the problem of estimating the variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean.
We design a novel recursive estimator that requires $O(1)$ at each step, does not require any historical samples or any prior knowledge of run-length, and has optimal $O(frac1n) rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees.
arXiv Detail & Related papers (2024-09-09T15:42:28Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function
Estimation in Off-policy Evaluation [1.575865518040625]
We study the off-policy evaluation problem in an infinite-horizon Markov decision process with continuous states and actions.
We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem.
arXiv Detail & Related papers (2022-01-17T01:09:38Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Improved Prediction and Network Estimation Using the Monotone Single
Index Multi-variate Autoregressive Model [34.529641317832024]
We develop a semi-parametric approach based on the monotone single-index multi-variate autoregressive model (SIMAM)
We provide theoretical guarantees for dependent data and an alternating projected gradient descent algorithm.
We demonstrate the superior performance both on simulated data and two real data examples.
arXiv Detail & Related papers (2021-06-28T12:32:29Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.