Learning Robust Decision Policies from Observational Data
- URL: http://arxiv.org/abs/2006.02355v1
- Date: Wed, 3 Jun 2020 16:02:57 GMT
- Title: Learning Robust Decision Policies from Observational Data
- Authors: Muhammad Osama, Dave Zachariah, Peter Stoica
- Abstract summary: It is of interest to learn robust policies that reduce the risk of outcomes with high costs.
We develop a method for learning policies that reduce tails of the cost distribution at a specified level.
- Score: 21.05564340986074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of learning a decision policy from observational data
of past decisions in contexts with features and associated outcomes. The past
policy maybe unknown and in safety-critical applications, such as medical
decision support, it is of interest to learn robust policies that reduce the
risk of outcomes with high costs. In this paper, we develop a method for
learning policies that reduce tails of the cost distribution at a specified
level and, moreover, provide a statistically valid bound on the cost of each
decision. These properties are valid under finite samples -- even in scenarios
with uneven or no overlap between features for different decisions in the
observed data -- by building on recent results in conformal prediction. The
performance and statistical properties of the proposed method are illustrated
using both real and synthetic data.
Related papers
- Predictive Performance Comparison of Decision Policies Under Confounding [32.21041697921289]
We propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches.
Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison.
arXiv Detail & Related papers (2024-04-01T01:27:07Z) - Learning under Selective Labels with Data from Heterogeneous
Decision-makers: An Instrumental Variable Approach [7.629248625993988]
We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making.
We propose a weighted learning approach that learns prediction rules robust to the label selection bias in both identification settings.
arXiv Detail & Related papers (2023-06-13T06:34:44Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Off-Policy Evaluation with Out-of-Sample Guarantees [21.527138355664174]
We consider the problem of evaluating the performance of a decision policy using past observational data.
We show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution.
The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
arXiv Detail & Related papers (2023-01-20T15:56:39Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Learning Pareto-Efficient Decisions with Confidence [21.915057426589748]
The paper considers the problem of multi-objective decision support when outcomes are uncertain.
This enables quantifying trade-offs between decisions in terms of tail outcomes that are relevant in safety-critical applications.
arXiv Detail & Related papers (2021-10-19T11:32:17Z) - Robust Batch Policy Learning in Markov Decision Processes [0.0]
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP)
We propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution.
arXiv Detail & Related papers (2020-11-09T04:41:21Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.