Off-policy Policy Evaluation For Sequential Decisions Under Unobserved
Confounding
- URL: http://arxiv.org/abs/2003.05623v1
- Date: Thu, 12 Mar 2020 05:20:37 GMT
- Title: Off-policy Policy Evaluation For Sequential Decisions Under Unobserved
Confounding
- Authors: Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill
- Abstract summary: We assess robustness of OPE methods under unobserved confounding.
We show that even small amounts of per-decision confounding can heavily bias OPE methods.
We propose an efficient loss-minimization-based procedure for computing worst-case bounds.
- Score: 33.58862183373374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When observed decisions depend only on observed features, off-policy policy
evaluation (OPE) methods for sequential decision making problems can estimate
the performance of evaluation policies before deploying them. This assumption
is frequently violated due to unobserved confounders, unrecorded variables that
impact both the decisions and their outcomes. We assess robustness of OPE
methods under unobserved confounding by developing worst-case bounds on the
performance of an evaluation policy. When unobserved confounders can affect
every decision in an episode, we demonstrate that even small amounts of
per-decision confounding can heavily bias OPE methods. Fortunately, in a number
of important settings found in healthcare, policy-making, operations, and
technology, unobserved confounders may primarily affect only one of the many
decisions made. Under this less pessimistic model of one-decision confounding,
we propose an efficient loss-minimization-based procedure for computing
worst-case bounds, and prove its statistical consistency. On two simulated
healthcare examples---management of sepsis patients and developmental
interventions for autistic children---where this is a reasonable model of
confounding, we demonstrate that our method invalidates non-robust results and
provides meaningful certificates of robustness, allowing reliable selection of
policies even under unobserved confounding.
Related papers
- Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Predictive Performance Comparison of Decision Policies Under Confounding [32.21041697921289]
We propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches.
Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison.
arXiv Detail & Related papers (2024-04-01T01:27:07Z) - Explaining by Imitating: Understanding Decisions by Interpretable Policy
Learning [72.80902932543474]
Understanding human behavior from observed data is critical for transparency and accountability in decision-making.
Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging.
We propose a data-driven representation of decision-making behavior that inheres transparency by design, accommodates partial observability, and operates completely offline.
arXiv Detail & Related papers (2023-10-28T13:06:14Z) - Causal Inference under Data Restrictions [0.0]
This dissertation focuses on modern causal inference under uncertainty and data restrictions.
It includes applications to neoadjuvant clinical trials, distributed data networks, and robust individualized decision making.
arXiv Detail & Related papers (2023-01-20T20:14:32Z) - Model-Free and Model-Based Policy Evaluation when Causality is Uncertain [7.858296711223292]
In off-policy evaluation, there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy.
We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons.
We show that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics.
arXiv Detail & Related papers (2022-04-02T23:40:15Z) - Identification of Subgroups With Similar Benefits in Off-Policy Policy
Evaluation [60.71312668265873]
We develop a method to balance the need for personalization with confident predictions.
We show that our method can be used to form accurate predictions of heterogeneous treatment effects.
arXiv Detail & Related papers (2021-11-28T23:19:12Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Identifying Causal-Effect Inference Failure with Uncertainty-Aware
Models [41.53326337725239]
We introduce a practical approach for integrating uncertainty estimation into a class of state-of-the-art neural network methods.
We show that our methods enable us to deal gracefully with situations of "no-overlap", common in high-dimensional data.
We show that correctly modeling uncertainty can keep us from giving overconfident and potentially harmful recommendations.
arXiv Detail & Related papers (2020-07-01T00:37:41Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.