Bridging the Imitation Gap by Adaptive Insubordination
- URL: http://arxiv.org/abs/2007.12173v3
- Date: Fri, 3 Dec 2021 18:53:42 GMT
- Title: Bridging the Imitation Gap by Adaptive Insubordination
- Authors: Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana
Lazebnik, Aniruddha Kembhavi, Alexander Schwing
- Abstract summary: We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
- Score: 88.35564081175642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In practice, imitation learning is preferred over pure reinforcement learning
whenever it is possible to design a teaching agent to provide expert
supervision. However, we show that when the teaching agent makes decisions with
access to privileged information that is unavailable to the student, this
information is marginalized during imitation learning, resulting in an
"imitation gap" and, potentially, poor results. Prior work bridges this gap via
a progression from imitation learning to reinforcement learning. While often
successful, gradual progression fails for tasks that require frequent switches
between exploration and memorization. To better address these tasks and
alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR).
ADVISOR dynamically weights imitation and reward-based reinforcement learning
losses during training, enabling on-the-fly switching between imitation and
exploration. On a suite of challenging tasks set within gridworlds, multi-agent
particle environments, and high-fidelity 3D simulators, we show that on-the-fly
switching with ADVISOR outperforms pure imitation, pure reinforcement learning,
as well as their sequential and parallel combinations.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Latent Policies for Adversarial Imitation Learning [21.105328282702885]
This paper considers learning robot locomotion and manipulation tasks from expert demonstrations.
Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent.
A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems.
arXiv Detail & Related papers (2022-06-22T18:06:26Z) - ReIL: A Framework for Reinforced Intervention-based Imitation Learning [3.0846824529023387]
We introduce Reinforced Intervention-based Learning (ReIL), a framework consisting of a general intervention-based learning algorithm and a multi-task imitation learning model.
Experimental results from real world mobile robot navigation challenges indicate that ReIL learns rapidly from sparse supervisor corrections without suffering deterioration in performance.
arXiv Detail & Related papers (2022-03-29T09:30:26Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Adversarial Imitation Learning with Trajectorial Augmentation and
Correction [61.924411952657756]
We introduce a novel augmentation method which preserves the success of the augmented trajectories.
We develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts.
Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation.
arXiv Detail & Related papers (2021-03-25T14:49:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.