Bootstrapping Statistical Inference for Off-Policy Evaluation
- URL: http://arxiv.org/abs/2102.03607v2
- Date: Tue, 9 Feb 2021 11:19:15 GMT
- Title: Bootstrapping Statistical Inference for Off-Policy Evaluation
- Authors: Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesv\'ari, Mengdi
Wang
- Abstract summary: We study the use of bootstrapping in off-policy evaluation (OPE)
We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is efficient and consistent for off-policy statistical inference.
We evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.
- Score: 43.79456564713911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bootstrapping provides a flexible and effective approach for assessing the
quality of batch reinforcement learning, yet its theoretical property is less
understood. In this paper, we study the use of bootstrapping in off-policy
evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE)
that is known to be minimax-optimal in the tabular and linear-model cases. We
propose a bootstrapping FQE method for inferring the distribution of the policy
evaluation error and show that this method is asymptotically efficient and
distributionally consistent for off-policy statistical inference. To overcome
the computation limit of bootstrapping, we further adapt a subsampling
procedure that improves the runtime by an order of magnitude. We numerically
evaluate the bootrapping method in classical RL environments for confidence
interval estimation, estimating the variance of off-policy evaluator, and
estimating the correlation between multiple off-policy evaluators.
Related papers
- Online Estimation and Inference for Robust Policy Evaluation in
Reinforcement Learning [7.875680651592574]
We develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation.
This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation.
arXiv Detail & Related papers (2023-10-04T04:57:35Z) - A Tale of Sampling and Estimation in Discounted Reinforcement Learning [50.43256303670011]
We present a minimax lower bound on the discounted mean estimation problem.
We show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties.
arXiv Detail & Related papers (2023-04-11T09:13:17Z) - Unifying Gradient Estimators for Meta-Reinforcement Learning via
Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation.
Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Accountable Off-Policy Evaluation With Kernel Bellman Statistics [29.14119984573459]
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments.
Due to the limited information from off-policy data, it is highly desirable to construct rigorous confidence intervals, not just point estimation.
We propose a new variational framework which reduces the problem of calculating tight confidence bounds in OPE.
arXiv Detail & Related papers (2020-08-15T07:24:38Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.