Latent Policies for Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2206.11299v1
- Date: Wed, 22 Jun 2022 18:06:26 GMT
- Title: Latent Policies for Adversarial Imitation Learning
- Authors: Tianyu Wang, Nikhil Karnwal, Nikolay Atanasov
- Abstract summary: This paper considers learning robot locomotion and manipulation tasks from expert demonstrations.
Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent.
A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems.
- Score: 21.105328282702885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers learning robot locomotion and manipulation tasks from
expert demonstrations. Generative adversarial imitation learning (GAIL) trains
a discriminator that distinguishes expert from agent transitions, and in turn
use a reward defined by the discriminator output to optimize a policy generator
for the agent. This generative adversarial training approach is very powerful
but depends on a delicate balance between the discriminator and the generator
training. In high-dimensional problems, the discriminator training may easily
overfit or exploit associations with task-irrelevant features for transition
classification. A key insight of this work is that performing imitation
learning in a suitable latent task space makes the training process stable,
even in challenging high-dimensional problems. We use an action encoder-decoder
model to obtain a low-dimensional latent action space and train a LAtent Policy
using Adversarial imitation Learning (LAPAL). The encoder-decoder model can be
trained offline from state-action pairs to obtain a task-agnostic latent action
representation or online, simultaneously with the discriminator and generator
training, to obtain a task-aware latent action representation. We demonstrate
that LAPAL training is stable, with near-monotonic performance improvement, and
achieves expert performance in most locomotion and manipulation tasks, while a
GAIL baseline converges slower and does not achieve expert performance in
high-dimensional environments.
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Conditional Neural Expert Processes for Learning Movement Primitives from Demonstration [1.9336815376402723]
Conditional Neural Expert Processes (CNEP) learns to assign demonstrations from different modes to distinct expert networks.
CNEP does not require supervision on which mode the trajectories belong to.
Our system is capable of on-the-fly adaptation to environmental changes via an online conditioning mechanism.
arXiv Detail & Related papers (2024-02-13T12:52:02Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task.
A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - TAVAT: Token-Aware Virtual Adversarial Training for Language
Understanding [55.16953347580948]
Gradient-based adversarial training is widely used in improving the robustness of neural networks.
It cannot be easily adapted to natural language processing tasks since the embedding space is discrete.
We propose a Token-Aware Virtual Adrial Training method to craft fine-grained perturbations.
arXiv Detail & Related papers (2020-04-30T02:03:24Z) - Constrained-Space Optimization and Reinforcement Learning for Complex
Tasks [42.648636742651185]
Learning from Demonstration is increasingly used for transferring operator manipulation skills to robots.
This paper presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks.
arXiv Detail & Related papers (2020-04-01T21:50:11Z) - ACNMP: Skill Transfer and Task Extrapolation through Learning from
Demonstration and Reinforcement Learning via Representation Sharing [5.06461227260756]
ACNMPs can be used to implement skill transfer between robots having different morphology.
We show the real-world suitability of ACNMPs through real robot experiments.
arXiv Detail & Related papers (2020-03-25T11:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.