Intrinsic Reward Driven Imitation Learning via Generative Model
- URL: http://arxiv.org/abs/2006.15061v4
- Date: Fri, 11 Sep 2020 09:40:12 GMT
- Title: Intrinsic Reward Driven Imitation Learning via Generative Model
- Authors: Xingrui Yu, Yueming Lyu and Ivor W. Tsang
- Abstract summary: Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in a high-dimensional environment.
We propose a novel reward learning module to generate intrinsic reward signals via a generative model.
Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration.
- Score: 48.97800481338626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning in a high-dimensional environment is challenging. Most
inverse reinforcement learning (IRL) methods fail to outperform the
demonstrator in such a high-dimensional environment, e.g., Atari domain. To
address this challenge, we propose a novel reward learning module to generate
intrinsic reward signals via a generative model. Our generative method can
perform better forward state transition and backward action encoding, which
improves the module's dynamics modeling ability in the environment. Thus, our
module provides the imitation agent both the intrinsic intention of the
demonstrator and a better exploration ability, which is critical for the agent
to outperform the demonstrator. Empirical results show that our method
outperforms state-of-the-art IRL methods on multiple Atari games, even with
one-life demonstration. Remarkably, our method achieves performance that is up
to 5 times the performance of the demonstration.
Related papers
- STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Sample Efficient Imitation Learning via Reward Function Trained in
Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations.
In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - REMAX: Relational Representation for Multi-Agent Exploration [13.363887960136102]
We propose a learning-based exploration strategy to generate the initial states of a game.
We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.
arXiv Detail & Related papers (2020-08-12T10:23:35Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z) - Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.