Delphic Offline Reinforcement Learning under Nonidentifiable Hidden
Confounding
- URL: http://arxiv.org/abs/2306.01157v1
- Date: Thu, 1 Jun 2023 21:27:22 GMT
- Title: Delphic Offline Reinforcement Learning under Nonidentifiable Hidden
Confounding
- Authors: Aliz\'ee Pace, Hugo Y\`eche, Bernhard Sch\"olkopf, Gunnar R\"atsch,
Guy Tennenholtz
- Abstract summary: We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty.
We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them.
Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.
- Score: 10.315867984674032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A prominent challenge of offline reinforcement learning (RL) is the issue of
hidden confounding: unobserved variables may influence both the actions taken
by the agent and the observed outcomes. Hidden confounding can compromise the
validity of any causal conclusion drawn from data and presents a major obstacle
to effective offline RL. In the present paper, we tackle the problem of hidden
confounding in the nonidentifiable setting. We propose a definition of
uncertainty due to hidden confounding bias, termed delphic uncertainty, which
uses variation over world models compatible with the observations, and
differentiate it from the well-known epistemic and aleatoric uncertainties. We
derive a practical method for estimating the three types of uncertainties, and
construct a pessimistic offline RL algorithm to account for them. Our method
does not assume identifiability of the unobserved confounders, and attempts to
reduce the amount of confounding bias. We demonstrate through extensive
experiments and ablations the efficacy of our approach on a sepsis management
benchmark, as well as on electronic health records. Our results suggest that
nonidentifiable hidden confounding bias can be mitigated to improve offline RL
solutions in practice.
Related papers
- Offline Recommender System Evaluation under Unobserved Confounding [5.4208903577329375]
Off-Policy Estimation methods allow us to learn and evaluate decision-making policies from logged data.
An important assumption that makes this work is the absence of unobserved confounders.
This work aims to highlight the problems that arise when performing off-policy estimation in the presence of unobserved confounders.
arXiv Detail & Related papers (2023-09-08T09:11:26Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Anti-Exploration by Random Network Distillation [63.04360288089277]
We show that a naive choice of conditioning for the Random Network Distillation (RND) is not discriminative enough to be used as an uncertainty estimator.
We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM)
We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
arXiv Detail & Related papers (2023-01-31T13:18:33Z) - Causal Deep Reinforcement Learning Using Observational Data [11.790171301328158]
We propose two deconfounding methods in deep reinforcement learning (DRL)
The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function.
We prove the effectiveness of our deconfounding methods and validate them experimentally.
arXiv Detail & Related papers (2022-11-28T14:34:39Z) - Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement
Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment.
Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions.
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z) - The Hidden Uncertainty in a Neural Networks Activations [105.4223982696279]
The distribution of a neural network's latent representations has been successfully used to detect out-of-distribution (OOD) data.
This work investigates whether this distribution correlates with a model's epistemic uncertainty, thus indicating its ability to generalise to novel inputs.
arXiv Detail & Related papers (2020-12-05T17:30:35Z) - Deep Learning based Uncertainty Decomposition for Real-time Control [9.067368638784355]
We propose a novel method for detecting the absence of training data using deep learning.
We show its advantages over existing approaches on synthetic and real-world datasets.
We further demonstrate the practicality of this uncertainty estimate in deploying online data-efficient control on a simulated quadcopter.
arXiv Detail & Related papers (2020-10-06T10:46:27Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Uncertainty-Gated Stochastic Sequential Model for EHR Mortality
Prediction [6.170898159041278]
We present a novel variational recurrent network that estimates the distribution of missing variables, updates hidden states, and predicts the possibility of in-hospital mortality.
It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner.
arXiv Detail & Related papers (2020-03-02T04:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.