Related papers: Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

URL: http://arxiv.org/abs/2109.03747v1
Date: Wed, 8 Sep 2021 16:09:47 GMT
Title: Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values
Authors: Mahed Abroshan, Kai Hou Yip, Cem Tekin, Mihaela van der Schaar
Abstract summary: We consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features. The goal is to recommend an action when $Xt$, a degraded version of $Xb$ with missing values, is observed. In particular, we introduce the textitconservative strategy where the policy is designed to safely handle the uncertainty due to missingness.
Score: 77.99648230758491
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In high-stakes applications of data-driven decision making like healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. Firstly, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this paper, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when $\Xt$, a degraded version of $\Xb$ with missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the \textit{conservative strategy} where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy we need to estimate posterior distribution $p(\Xb|\Xt)$, we use variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAE) which are designed to capture the underlying structure of features with missing values.

Related papers

Linear Regression under Missing or Corrupted Coordinates [58.9213131489513]
We study how data may be corrupted or erased by an adversary under a coordinate-wise budget.<n>In the incomplete data setting, an adversary may inspect the dataset and delete entries in up to an $eta$-fraction of samples per coordinate.<n>In the corrupted data setting, the adversary instead replaces values arbitrarily, and the corruption locations are unknown to the learner.
arXiv Detail & Related papers (2025-09-23T17:01:43Z)
Quantile-Optimal Policy Learning under Unmeasured Confounding [55.72891849926314]
We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $alpha$-quantile for some $alpha in (0, 1)$.<n>Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-08T13:37:38Z)
Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning [2.793095554369282]
Most popular methods for off-policy learning include policy improvement steps where a learned state-action ($Q$) value function is maximized over selected batches of data. We revisit this strategy in environments with "mixed-sign" reward functions. We find that this second approach, when applied to continuous action spaces with mixed-sign rewards, consistently and significantly outperforms state-of-the-art methods augmented by resetting.
arXiv Detail & Related papers (2023-11-30T16:31:04Z)
One-Shot Strategic Classification Under Unknown Costs [19.390528752448283]
We show that for a broad class of costs, even small mis-estimations of the cost function can entail trivial accuracy in the worst case. Our analysis reveals important strategic responses, particularly the value of dual regularization with respect to the cost manipulation function.
arXiv Detail & Related papers (2023-11-05T20:43:08Z)
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data. We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z)
Leveraging variational autoencoders for multiple data imputation [0.5156484100374059]
We investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations. To overcome this, we employ $beta$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification.
arXiv Detail & Related papers (2022-09-30T08:58:43Z)
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks. We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z)
On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks [23.502721524477444]
We present a synthetic example illustrating how this approach can lead to very poor but stable estimates. We identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. We present an alternative formulation, termed $beta$-NLL, in which each data point's contribution to the loss is weighted by the $beta$-exponentiated variance estimate.
arXiv Detail & Related papers (2022-03-17T08:46:17Z)
Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z)
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.