Identification of Subgroups With Similar Benefits in Off-Policy Policy
Evaluation
- URL: http://arxiv.org/abs/2111.14272v1
- Date: Sun, 28 Nov 2021 23:19:12 GMT
- Title: Identification of Subgroups With Similar Benefits in Off-Policy Policy
Evaluation
- Authors: Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez,
Emma Brunskill
- Abstract summary: We develop a method to balance the need for personalization with confident predictions.
We show that our method can be used to form accurate predictions of heterogeneous treatment effects.
- Score: 60.71312668265873
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Off-policy policy evaluation methods for sequential decision making can be
used to help identify if a proposed decision policy is better than a current
baseline policy. However, a new decision policy may be better than a baseline
policy for some individuals but not others. This has motivated a push towards
personalization and accurate per-state estimates of heterogeneous treatment
effects (HTEs). Given the limited data present in many important applications,
individual predictions can come at a cost to accuracy and confidence in such
predictions. We develop a method to balance the need for personalization with
confident predictions by identifying subgroups where it is possible to
confidently estimate the expected difference in a new decision policy relative
to a baseline. We propose a novel loss function that accounts for uncertainty
during the subgroup partitioning phase. In experiments, we show that our method
can be used to form accurate predictions of HTEs where other methods struggle.
Related papers
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Conformal Off-Policy Prediction [14.83348592874271]
We develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state.
Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy.
arXiv Detail & Related papers (2022-06-14T09:31:18Z) - Safe Policy Learning through Extrapolation: Application to Pre-trial
Risk Assessment [0.0]
We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy.
We extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations.
We derive new classification and recommendation rules that retain the transparency and interpretability of the existing risk assessment instrument.
arXiv Detail & Related papers (2021-09-22T00:52:03Z) - Offline Policy Selection under Uncertainty [113.57441913299868]
We consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset.
Access to the full distribution over one's belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics.
We show how BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric.
arXiv Detail & Related papers (2020-12-12T23:09:21Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.