On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning
- URL: http://arxiv.org/abs/2110.06539v1
- Date: Wed, 13 Oct 2021 07:31:31 GMT
- Title: On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning
- Authors: Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik,
Uri Shalit
- Abstract summary: We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
- Score: 69.48387059607387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of using expert data with unobserved confounders for
imitation and reinforcement learning. We begin by defining the problem of
learning from confounded expert data in a contextual MDP setup. We analyze the
limitations of learning from such data with and without external reward, and
propose an adjustment of standard imitation learning algorithms to fit this
setup. We then discuss the problem of distribution shift between the expert
data and the online environment when the data is only partially observable. We
prove possibility and impossibility results for imitation learning under
arbitrary distribution shift of the missing covariates. When additional
external reward is provided, we propose a sampling procedure that addresses the
unknown shift and prove convergence to an optimal solution. Finally, we
validate our claims empirically on challenging assistive healthcare and
recommender system simulation tasks.
Related papers
- Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Learning without Knowing: Unobserved Context in Continuous Transfer
Reinforcement Learning [16.814772057210366]
We consider a transfer Reinforcement Learning problem in continuous state and action spaces under unobserved contextual information.
Our goal is to use the context-aware expert data to learn an optimal context-unaware policy for the learner using only a few new data samples.
arXiv Detail & Related papers (2021-06-07T17:49:22Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - On the Sample Complexity of Adversarial Multi-Source PAC Learning [46.24794665486056]
In a single-source setting, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability.
We show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources.
Our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
arXiv Detail & Related papers (2020-02-24T17:19:04Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.