Policy Learning with Asymmetric Counterfactual Utilities
- URL: http://arxiv.org/abs/2206.10479v3
- Date: Tue, 28 Nov 2023 16:23:08 GMT
- Title: Policy Learning with Asymmetric Counterfactual Utilities
- Authors: Eli Ben-Michael and Kosuke Imai and Zhichao Jiang
- Abstract summary: We consider optimal policy learning with asymmetric counterfactual utility functions.
We derive minimax decision rules by minimizing the maximum expected utility loss.
We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems.
- Score: 0.6138671548064356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-driven decision making plays an important role even in high stakes
settings like medicine and public policy. Learning optimal policies from
observed data requires a careful formulation of the utility function whose
expected value is maximized across a population. Although researchers typically
use utilities that depend on observed outcomes alone, in many settings the
decision maker's utility function is more properly characterized by the joint
set of potential outcomes under all actions. For example, the Hippocratic
principle to "do no harm" implies that the cost of causing death to a patient
who would otherwise survive without treatment is greater than the cost of
forgoing life-saving treatment. We consider optimal policy learning with
asymmetric counterfactual utility functions of this form that consider the
joint set of potential outcomes. We show that asymmetric counterfactual
utilities lead to an unidentifiable expected utility function, and so we first
partially identify it. Drawing on statistical decision theory, we then derive
minimax decision rules by minimizing the maximum expected utility loss relative
to different alternative policies. We show that one can learn minimax loss
decision rules from observed data by solving intermediate classification
problems, and establish that the finite sample excess expected utility loss of
this procedure is bounded by the regret of these intermediate classifiers. We
apply this conceptual framework and methodology to the decision about whether
or not to use right heart catheterization for patients with possible pulmonary
hypertension.
Related papers
- Policy Learning with Distributional Welfare [1.0742675209112622]
Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE)
This paper proposes an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE)
arXiv Detail & Related papers (2023-11-27T14:51:30Z) - Optimal and Fair Encouragement Policy Evaluation and Learning [11.712023983596914]
We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
arXiv Detail & Related papers (2023-09-12T20:45:30Z) - Inference for relative sparsity [0.0]
We develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine.
Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case.
To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference.
arXiv Detail & Related papers (2023-06-25T17:14:45Z) - Optimal Treatment Regimes for Proximal Causal Learning [7.672587258250301]
We propose a novel optimal individualized treatment regime based on outcome and treatment confounding bridges.
We show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature.
arXiv Detail & Related papers (2022-12-19T14:29:25Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Optimal discharge of patients from intensive care via a data-driven
policy learning framework [58.720142291102135]
It is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay and the risk of readmission or even death following the discharge decision.
This work introduces an end-to-end general framework for capturing this trade-off to recommend optimal discharge timing decisions.
A data-driven approach is used to derive a parsimonious, discrete state space representation that captures a patient's physiological condition.
arXiv Detail & Related papers (2021-12-17T04:39:33Z) - Median Optimal Treatment Regimes [7.241149193573696]
We propose a new median optimal treatment regime that treats individuals whose conditional median is higher under treatment.
This ensures that optimal decisions for individuals from the same group are not overly influenced by a small fraction of the group.
We introduce a new measure of value, the Average Conditional Median Effect (ACME), which summarizes across-group median treatment outcomes of a policy.
arXiv Detail & Related papers (2021-03-02T15:26:20Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.