Generalizing Off-Policy Evaluation From a Causal Perspective For
Sequential Decision-Making
- URL: http://arxiv.org/abs/2201.08262v1
- Date: Thu, 20 Jan 2022 16:13:16 GMT
- Title: Generalizing Off-Policy Evaluation From a Causal Perspective For
Sequential Decision-Making
- Authors: Sonali Parbhoo, Shalmali Joshi, Finale Doshi-Velez
- Abstract summary: We argue that explicitly highlighting the association has important implications on our understanding of the fundamental limits of OPE.
We demonstrate how this association motivates natural desiderata to consider a general set of causal estimands.
We discuss each of these aspects as actionable desiderata for future OPE research at scale and in-line with practical utility.
- Score: 32.06576007608403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assessing the effects of a policy based on observational data from a
different policy is a common problem across several high-stake decision-making
domains, and several off-policy evaluation (OPE) techniques have been proposed.
However, these methods largely formulate OPE as a problem disassociated from
the process used to generate the data (i.e. structural assumptions in the form
of a causal graph). We argue that explicitly highlighting this association has
important implications on our understanding of the fundamental limits of OPE.
First, this implies that current formulation of OPE corresponds to a narrow set
of tasks, i.e. a specific causal estimand which is focused on prospective
evaluation of policies over populations or sub-populations. Second, we
demonstrate how this association motivates natural desiderata to consider a
general set of causal estimands, particularly extending the role of OPE for
counterfactual off-policy evaluation at the level of individuals of the
population. A precise description of the causal estimand highlights which OPE
estimands are identifiable from observational data under the stated generative
assumptions. For those OPE estimands that are not identifiable, the causal
perspective further highlights where more experimental data is necessary, and
highlights situations where human expertise can aid identification and
estimation. Furthermore, many formalisms of OPE overlook the role of
uncertainty entirely in the estimation process.We demonstrate how specifically
characterising the causal estimand highlights the different sources of
uncertainty and when human expertise can naturally manage this uncertainty. We
discuss each of these aspects as actionable desiderata for future OPE research
at scale and in-line with practical utility.
Related papers
- Challenges and Considerations in the Evaluation of Bayesian Causal Discovery [49.0053848090947]
Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making.
Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, causal discovery presents challenges due to the nature of its quantity.
No consensus on the most suitable metric for evaluation.
arXiv Detail & Related papers (2024-06-05T12:45:23Z) - When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective [64.73162159837956]
evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging.
We propose DataCOPE, a data-centric framework for evaluating a target policy given a dataset.
Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies.
arXiv Detail & Related papers (2023-11-23T17:13:37Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Identification of Subgroups With Similar Benefits in Off-Policy Policy
Evaluation [60.71312668265873]
We develop a method to balance the need for personalization with confident predictions.
We show that our method can be used to form accurate predictions of heterogeneous treatment effects.
arXiv Detail & Related papers (2021-11-28T23:19:12Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Projected State-action Balancing Weights for Offline Reinforcement
Learning [9.732863739456034]
This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy.
We propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation.
Numerical experiments demonstrate the promising performance of our proposed estimator.
arXiv Detail & Related papers (2021-09-10T03:00:44Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.