Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- URL: http://arxiv.org/abs/2204.00956v1
- Date: Sat, 2 Apr 2022 23:40:15 GMT
- Title: Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- Authors: David Bruns-Smith
- Abstract summary: In off-policy evaluation, there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy.
We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons.
We show that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics.
- Score: 7.858296711223292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When decision-makers can directly intervene, policy evaluation algorithms
give valid causal estimates. In off-policy evaluation (OPE), there may exist
unobserved variables that both impact the dynamics and are used by the unknown
behavior policy. These "confounders" will introduce spurious correlations and
naive estimates for a new policy will be biased. We develop worst-case bounds
to assess sensitivity to these unobserved confounders in finite horizons when
confounders are drawn iid each period. We demonstrate that a model-based
approach with robust MDPs gives sharper lower bounds by exploiting domain
knowledge about the dynamics. Finally, we show that when unobserved confounders
are persistent over time, OPE is far more difficult and existing techniques
produce extremely conservative bounds.
Related papers
- Offline Recommender System Evaluation under Unobserved Confounding [5.4208903577329375]
Off-Policy Estimation methods allow us to learn and evaluate decision-making policies from logged data.
An important assumption that makes this work is the absence of unobserved confounders.
This work aims to highlight the problems that arise when performing off-policy estimation in the presence of unobserved confounders.
arXiv Detail & Related papers (2023-09-08T09:11:26Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Off-Policy Evaluation in Partially Observed Markov Decision Processes
under Sequential Ignorability [8.388782503421504]
We consider off-policy evaluation of dynamic treatment rules under sequential ignorability.
We show that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes.
arXiv Detail & Related papers (2021-10-24T03:35:23Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
Dual Bounds [21.520045697447372]
Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy based on offline data previously collected under different policies.
This work considers the problem of constructing non-asymptotic confidence intervals in infinite-horizon off-policy evaluation.
We develop a practical algorithm through a primal-dual optimization-based approach.
arXiv Detail & Related papers (2021-03-09T22:31:20Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits.
We show that this discrepancy is due to the emphaction-stability of their objectives.
In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Off-policy Policy Evaluation For Sequential Decisions Under Unobserved
Confounding [33.58862183373374]
We assess robustness of OPE methods under unobserved confounding.
We show that even small amounts of per-decision confounding can heavily bias OPE methods.
We propose an efficient loss-minimization-based procedure for computing worst-case bounds.
arXiv Detail & Related papers (2020-03-12T05:20:37Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.