Statistical Bootstrapping for Uncertainty Estimation in Off-Policy
Evaluation
- URL: http://arxiv.org/abs/2007.13609v1
- Date: Mon, 27 Jul 2020 14:49:22 GMT
- Title: Statistical Bootstrapping for Uncertainty Estimation in Off-Policy
Evaluation
- Authors: Ilya Kostrikov and Ofir Nachum
- Abstract summary: We investigate the potential for statistical bootstrapping to be used as a way to produce calibrated confidence intervals for the true value of the policy.
We show that it can yield accurate confidence intervals in a variety of conditions, including challenging continuous control environments and small data regimes.
- Score: 38.31971190670345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reinforcement learning, it is typical to use the empirically observed
transitions and rewards to estimate the value of a policy via either
model-based or Q-fitting approaches. Although straightforward, these techniques
in general yield biased estimates of the true value of the policy. In this
work, we investigate the potential for statistical bootstrapping to be used as
a way to take these biased estimates and produce calibrated confidence
intervals for the true value of the policy. We identify conditions -
specifically, sufficient data size and sufficient coverage - under which
statistical bootstrapping in this setting is guaranteed to yield correct
confidence intervals. In practical situations, these conditions often do not
hold, and so we discuss and propose mechanisms that can be employed to mitigate
their effects. We evaluate our proposed method and show that it can yield
accurate confidence intervals in a variety of conditions, including challenging
continuous control environments and small data regimes.
Related papers
- Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Evaluating the Effectiveness of Index-Based Treatment Allocation [42.040099398176665]
When resources are scarce, an allocation policy is needed to decide who receives a resource.
This paper introduces methods to evaluate index-based allocation policies using data from a randomized control trial.
arXiv Detail & Related papers (2024-02-19T01:55:55Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Kernel Conditional Moment Constraints for Confounding Robust Inference [22.816690686310714]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value.
arXiv Detail & Related papers (2023-02-26T16:44:13Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Off-Policy Confidence Interval Estimation with Confounded Markov
Decision Process [14.828039846764549]
We show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process.
Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies.
arXiv Detail & Related papers (2022-02-22T00:03:48Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Accountable Off-Policy Evaluation With Kernel Bellman Statistics [29.14119984573459]
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments.
Due to the limited information from off-policy data, it is highly desirable to construct rigorous confidence intervals, not just point estimation.
We propose a new variational framework which reduces the problem of calculating tight confidence bounds in OPE.
arXiv Detail & Related papers (2020-08-15T07:24:38Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.