Invariant Causal Prediction for Block MDPs
- URL: http://arxiv.org/abs/2003.06016v2
- Date: Thu, 11 Jun 2020 18:01:02 GMT
- Title: Invariant Causal Prediction for Block MDPs
- Authors: Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta
Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup
- Abstract summary: Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
- Score: 106.63346115341862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization across environments is critical to the successful application
of reinforcement learning algorithms to real-world challenges. In this paper,
we consider the problem of learning abstractions that generalize in block MDPs,
families of environments with a shared latent state space and dynamics
structure over that latent space, but varying observations. We leverage tools
from causal inference to propose a method of invariant prediction to learn
model-irrelevance state abstractions (MISA) that generalize to novel
observations in the multi-environment setting. We prove that for certain
classes of environments, this approach outputs with high probability a state
abstraction corresponding to the causal feature set with respect to the return.
We further provide more general bounds on model error and generalization error
in the multi-environment setting, in the process showing a connection between
causal variable selection and the state abstraction framework for MDPs. We give
empirical evidence that our methods work in both linear and nonlinear settings,
attaining improved generalization over single- and multi-task baselines.
Related papers
- Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms [34.593772931446125]
monograph focuses on the exploration of various model-based and model-free approaches for Constrained within the context of average reward Markov Decision Processes (MDPs)
The primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs.
arXiv Detail & Related papers (2024-06-17T12:46:02Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Using Forwards-Backwards Models to Approximate MDP Homomorphisms [11.020094184644789]
We propose a novel approach to constructing homomorphisms in discrete action spaces.
We use a learnt model of environment dynamics to infer which state-action pairs lead to the same state.
In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit.
arXiv Detail & Related papers (2022-09-14T00:38:12Z) - Meta-Causal Feature Learning for Out-of-Distribution Generalization [71.38239243414091]
This paper presents a balanced meta-causal learner (BMCL), which includes a balanced task generation module (BTG) and a meta-causal feature learning module (MCFL)
BMCL effectively identifies the class-invariant visual regions for classification and may serve as a general framework to improve the performance of the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-22T09:07:02Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.