Invariant Causal Imitation Learning for Generalizable Policies
- URL: http://arxiv.org/abs/2311.01489v1
- Date: Thu, 2 Nov 2023 16:52:36 GMT
- Title: Invariant Causal Imitation Learning for Generalizable Policies
- Authors: Ioana Bica, Daniel Jarrett, Mihaela van der Schaar
- Abstract summary: We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
- Score: 87.51882102248395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider learning an imitation policy on the basis of demonstrated behavior
from multiple environments, with an eye towards deployment in an unseen
environment. Since the observable features from each setting may be different,
directly learning individual policies as mappings from features to actions is
prone to spurious correlations -- and may not generalize well. However, the
expert's policy is often a function of a shared latent structure underlying
those observable features that is invariant across settings. By leveraging data
from multiple environments, we propose Invariant Causal Imitation Learning
(ICIL), a novel technique in which we learn a feature representation that is
invariant across domains, on the basis of which we learn an imitation policy
that matches expert behavior. To cope with transition dynamics mismatch, ICIL
learns a shared representation of causal features (for all training
environments), that is disentangled from the specific representations of noise
variables (for each of those environments). Moreover, to ensure that the
learned policy matches the observation distribution of the expert's policy,
ICIL estimates the energy of the expert's observations and uses a
regularization term that minimizes the imitator policy's next state energy.
Experimentally, we compare our methods against several benchmarks in control
and healthcare tasks and show its effectiveness in learning imitation policies
capable of generalizing to unseen environments.
Related papers
- OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors.
In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns
for Cross-Domain Adaptation [5.090135391530077]
Policies with diverse behavior characteristics can generalize to downstream environments with various discrepancies.
Such policies might result in catastrophic damage during the deployment in practical scenarios like real-world systems.
We propose Diversity in Regulation (DiR) training diverse policies with regulated behaviors to discover desired patterns.
arXiv Detail & Related papers (2022-09-24T15:13:51Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Instance based Generalization in Reinforcement Learning [24.485597364200824]
We analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs)
We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training.
We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation.
arXiv Detail & Related papers (2020-11-02T16:19:44Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.