ADAIL: Adaptive Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2008.12647v1
- Date: Sun, 23 Aug 2020 06:11:00 GMT
- Title: ADAIL: Adaptive Adversarial Imitation Learning
- Authors: Yiren Lu, Jonathan Tompson
- Abstract summary: We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics.
This is an important problem in robotic learning because in real world scenarios 1) reward functions are hard to obtain, 2) learned policies from one domain are difficult to deploy in another due to varying source to target domain statistics, and 3) collecting expert demonstrations in multiple environments where the dynamics are known and controlled is often infeasible.
- Score: 11.270858993502705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for
learning adaptive policies that can be transferred between environments of
varying dynamics, by imitating a small number of demonstrations collected from
a single source domain. This is an important problem in robotic learning
because in real world scenarios 1) reward functions are hard to obtain, 2)
learned policies from one domain are difficult to deploy in another due to
varying source to target domain statistics, 3) collecting expert demonstrations
in multiple environments where the dynamics are known and controlled is often
infeasible. We address these constraints by building upon recent advances in
adversarial imitation learning; we condition our policy on a learned dynamics
embedding and we employ a domain-adversarial loss to learn a dynamics-invariant
discriminator. The effectiveness of our method is demonstrated on simulated
control tasks with varying environment dynamics and the learned adaptive agent
outperforms several recent baselines.
Related papers
- OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors.
In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Learn what matters: cross-domain imitation learning with task-relevant
embeddings [77.34726150561087]
We study how an autonomous agent learns to perform a task from demonstrations in a different domain, such as a different environment or different agent.
We propose a scalable framework that enables cross-domain imitation learning without access to additional demonstrations or further domain knowledge.
arXiv Detail & Related papers (2022-09-24T21:56:58Z) - Learning Multi-Task Transferable Rewards via Variational Inverse
Reinforcement Learning [10.782043595405831]
We extend an empowerment-based regularization technique to situations with multiple tasks based on the framework of a generative adversarial network.
Under the multitask environments with unknown dynamics, we focus on learning a reward and policy from unlabeled expert examples.
Our proposed method derives the variational lower bound of the situational mutual information to optimize it.
arXiv Detail & Related papers (2022-06-19T22:32:41Z) - Meta Learning on a Sequence of Imbalanced Domains with Difficulty
Awareness [6.648670454325191]
A typical setting across current meta learning algorithms assumes a stationary task distribution during meta training.
We consider realistic scenarios where task distribution is highly imbalanced with domain labels unavailable in nature.
We propose a kernel-based method for domain change detection and a difficulty-aware memory management mechanism.
arXiv Detail & Related papers (2021-09-29T00:53:09Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Learning Reactive and Predictive Differentiable Controllers for
Switching Linear Dynamical Models [7.653542219337937]
We present a framework for learning composite dynamical behaviors from expert demonstrations.
We learn a switching linear dynamical model with contacts encoded in switching conditions as a close approximation of our system dynamics.
We then use discrete-time LQR as the differentiable policy class for data-efficient learning of control to develop a control strategy.
arXiv Detail & Related papers (2021-03-26T04:40:24Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning.
This adaptation uses less than 0.2% of the data necessary to learn the task from scratch.
We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.