Bridging the Imitation Gap by Adaptive Insubordination
        - URL: http://arxiv.org/abs/2007.12173v3
- Date: Fri, 3 Dec 2021 18:53:42 GMT
- Title: Bridging the Imitation Gap by Adaptive Insubordination
- Authors: Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana
  Lazebnik, Aniruddha Kembhavi, Alexander Schwing
- Abstract summary: We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
- Score: 88.35564081175642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   In practice, imitation learning is preferred over pure reinforcement learning
whenever it is possible to design a teaching agent to provide expert
supervision. However, we show that when the teaching agent makes decisions with
access to privileged information that is unavailable to the student, this
information is marginalized during imitation learning, resulting in an
"imitation gap" and, potentially, poor results. Prior work bridges this gap via
a progression from imitation learning to reinforcement learning. While often
successful, gradual progression fails for tasks that require frequent switches
between exploration and memorization. To better address these tasks and
alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR).
ADVISOR dynamically weights imitation and reward-based reinforcement learning
losses during training, enabling on-the-fly switching between imitation and
exploration. On a suite of challenging tasks set within gridworlds, multi-agent
particle environments, and high-fidelity 3D simulators, we show that on-the-fly
switching with ADVISOR outperforms pure imitation, pure reinforcement learning,
as well as their sequential and parallel combinations.
 
      
        Related papers
        - Scalable Strategies for Continual Learning with Replay [0.0]
 We show that replay can play a foundational role in continual learning, allowing models to reconcile new information with past knowledge.<n>In practice, however, replay is quite unscalable, doubling the cost of continual learning when applied naively.<n>We introduce consolidation, a phasic approach to replay which leads to up to 55% less replay samples being needed for a given performance target.<n>Then, we propose sequential merging, an offshoot of task arithmetic which is tailored to the continual learning setting and is shown to work well in combination with replay.
 arXiv  Detail & Related papers  (2025-05-18T18:23:50Z)
- Latent Action Priors for Locomotion with Deep Reinforcement Learning [42.642008092347986]
 Deep Reinforcement Learning (DRL) enables robots to learn complex behaviors through interaction with the environment.
We propose an inductive bias for learning locomotion that is especially useful for torque control.
We observe that the agent is not restricted to the reward levels of the demonstration, and performance in transfer tasks is improved significantly.
 arXiv  Detail & Related papers  (2024-10-04T09:10:56Z)
- RILe: Reinforced Imitation Learning [60.63173816209543]
 RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
 arXiv  Detail & Related papers  (2024-06-12T17:56:31Z)
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
 We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
 arXiv  Detail & Related papers  (2023-11-21T21:05:21Z)
- Latent Policies for Adversarial Imitation Learning [21.105328282702885]
 This paper considers learning robot locomotion and manipulation tasks from expert demonstrations.
Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent.
A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems.
 arXiv  Detail & Related papers  (2022-06-22T18:06:26Z)
- ReIL: A Framework for Reinforced Intervention-based Imitation Learning [3.0846824529023387]
 We introduce Reinforced Intervention-based Learning (ReIL), a framework consisting of a general intervention-based learning algorithm and a multi-task imitation learning model.
 Experimental results from real world mobile robot navigation challenges indicate that ReIL learns rapidly from sparse supervisor corrections without suffering deterioration in performance.
 arXiv  Detail & Related papers  (2022-03-29T09:30:26Z)
- Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
 We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
 arXiv  Detail & Related papers  (2022-01-27T19:51:09Z)
- Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
 Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
 arXiv  Detail & Related papers  (2021-12-17T16:28:06Z)
- Learning from Guided Play: A Scheduled Hierarchical Approach for
  Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
 We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
 arXiv  Detail & Related papers  (2021-12-16T14:58:08Z)
- PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
  Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
 We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
 arXiv  Detail & Related papers  (2021-06-09T14:10:50Z)
- Adversarial Imitation Learning with Trajectorial Augmentation and
  Correction [61.924411952657756]
 We introduce a novel augmentation method which preserves the success of the augmented trajectories.
We develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts.
Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation.
 arXiv  Detail & Related papers  (2021-03-25T14:49:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.