Conservative Policy Construction Using Variational Autoencoders for
Logged Data with Missing Values
- URL: http://arxiv.org/abs/2109.03747v1
- Date: Wed, 8 Sep 2021 16:09:47 GMT
- Title: Conservative Policy Construction Using Variational Autoencoders for
Logged Data with Missing Values
- Authors: Mahed Abroshan, Kai Hou Yip, Cem Tekin, Mihaela van der Schaar
- Abstract summary: We consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features.
The goal is to recommend an action when $Xt$, a degraded version of $Xb$ with missing values, is observed.
In particular, we introduce the textitconservative strategy where the policy is designed to safely handle the uncertainty due to missingness.
- Score: 77.99648230758491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In high-stakes applications of data-driven decision making like healthcare,
it is of paramount importance to learn a policy that maximizes the reward while
avoiding potentially dangerous actions when there is uncertainty. There are two
main challenges usually associated with this problem. Firstly, learning through
online exploration is not possible due to the critical nature of such
applications. Therefore, we need to resort to observational datasets with no
counterfactuals. Secondly, such datasets are usually imperfect, additionally
cursed with missing values in the attributes of features. In this paper, we
consider the problem of constructing personalized policies using logged data
when there are missing values in the attributes of features in both training
and test data. The goal is to recommend an action (treatment) when $\Xt$, a
degraded version of $\Xb$ with missing values, is observed. We consider three
strategies for dealing with missingness. In particular, we introduce the
\textit{conservative strategy} where the policy is designed to safely handle
the uncertainty due to missingness. In order to implement this strategy we need
to estimate posterior distribution $p(\Xb|\Xt)$, we use variational autoencoder
to achieve this. In particular, our method is based on partial variational
autoencoders (PVAE) which are designed to capture the underlying structure of
features with missing values.
Related papers
- Handling Cost and Constraints with Off-Policy Deep Reinforcement
Learning [2.793095554369282]
Most popular methods for off-policy learning include policy improvement steps where a learned state-action ($Q$) value function is maximized over selected batches of data.
We revisit this strategy in environments with "mixed-sign" reward functions.
We find that this second approach, when applied to continuous action spaces with mixed-sign rewards, consistently and significantly outperforms state-of-the-art methods augmented by resetting.
arXiv Detail & Related papers (2023-11-30T16:31:04Z) - One-Shot Strategic Classification Under Unknown Costs [19.390528752448283]
We show that for a broad class of costs, even small mis-estimations of the cost function can entail trivial accuracy in the worst case.
Our analysis reveals important strategic responses, particularly the value of dual regularization with respect to the cost manipulation function.
arXiv Detail & Related papers (2023-11-05T20:43:08Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - Leveraging variational autoencoders for multiple data imputation [0.5156484100374059]
We investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies.
We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations.
To overcome this, we employ $beta$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification.
arXiv Detail & Related papers (2022-09-30T08:58:43Z) - Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z) - Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task.
This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.