Causal Reinforcement Learning using Observational and Interventional
Data
- URL: http://arxiv.org/abs/2106.14421v1
- Date: Mon, 28 Jun 2021 06:58:20 GMT
- Title: Causal Reinforcement Learning using Observational and Interventional
Data
- Authors: Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer
- Abstract summary: Learning efficiently a causal model of the environment is a key challenge of model RL agents operating in POMDPs.
We consider a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment.
We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model.
- Score: 14.856472820492364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning efficiently a causal model of the environment is a key challenge of
model-based RL agents operating in POMDPs. We consider here a scenario where
the learning agent has the ability to collect online experiences through direct
interactions with the environment (interventional data), but has also access to
a large collection of offline experiences, obtained by observing another agent
interacting with the environment (observational data). A key ingredient, that
makes this situation non-trivial, is that we allow the observed agent to
interact with the environment based on hidden information, which is not
observed by the learning agent. We then ask the following questions: can the
online and offline experiences be safely combined for learning a causal model ?
And can we expect the offline experiences to improve the agent's performances ?
To answer these questions, we import ideas from the well-established causal
framework of do-calculus, and we express model-based reinforcement learning as
a causal inference problem. Then, we propose a general yet simple methodology
for leveraging offline data during learning. In a nutshell, the method relies
on learning a latent-based causal transition model that explains both the
interventional and observational regimes, and then using the recovered latent
variable to infer the standard POMDP transition model via deconfounding. We
prove our method is correct and efficient in the sense that it attains better
generalization guarantees due to the offline data (in the asymptotic case), and
we illustrate its effectiveness empirically on synthetic toy problems. Our
contribution aims at bridging the gap between the fields of reinforcement
learning and causality.
Related papers
- Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Learning by Doing: An Online Causal Reinforcement Learning Framework
with Causal-Aware Policy [40.33036146207819]
We consider explicitly modeling the generation process of states with the graphical causal model.
We formulate the causal structure updating into the RL interaction process with active intervention learning of the environment.
arXiv Detail & Related papers (2024-02-07T14:09:34Z) - Causal Disentangled Variational Auto-Encoder for Preference
Understanding in Recommendation [50.93536377097659]
This paper introduces the Causal Disentangled Variational Auto-Encoder (CaD-VAE), a novel approach for learning causal disentangled representations from interaction data in recommender systems.
The approach utilizes structural causal models to generate causal representations that describe the causal relationship between latent factors.
arXiv Detail & Related papers (2023-04-17T00:10:56Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Feature-Based Interpretable Reinforcement Learning based on
State-Transition Models [3.883460584034766]
Growing concerns regarding the operational usage of AI models in the real-world has caused a surge of interest in explaining AI models' decisions to humans.
We propose a method for offering local explanations on risk in reinforcement learning.
arXiv Detail & Related papers (2021-05-14T23:43:11Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Offline Learning for Planning: A Summary [0.0]
Training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment.
Data sets containing recorded experiences of intelligent agents performing various tasks are accessible on the internet.
In this paper we adumbrate the ideas motivating the development of the state-of-the-art offline learning baselines.
arXiv Detail & Related papers (2020-10-05T11:41:11Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.