Related papers: Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments

Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments

URL: http://arxiv.org/abs/2012.02527v1
Date: Fri, 4 Dec 2020 11:18:02 GMT
Title: Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments
Authors: Alessandro Sestini, Alexander Kuhnle and Andrew D. Bagdanov
Abstract summary: Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
Score: 137.86426963572214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Reinforcement Learning achieves very good results in domains where reward functions can be manually engineered. At the same time, there is growing interest within the community in using games based on Procedurally Content Generation (PCG) as benchmark environments since this type of environment is perfect for studying overfitting and generalization of agents under domain shift. Inverse Reinforcement Learning (IRL) can instead extrapolate reward functions from expert demonstrations, with good results even on high-dimensional problems, however there are no examples of applying these techniques to procedurally-generated environments. This is mostly due to the number of demonstrations needed to find a good reward model. We propose a technique based on Adversarial Inverse Reinforcement Learning which can significantly decrease the need for expert demonstrations in PCG games. Through the use of an environment with a limited set of initial seed levels, plus some modifications to stabilize training, we show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain. We demonstrate the effectiveness of our technique on two procedural environments, MiniGrid and DeepCrawl, for a variety of tasks.

Related papers

Subtask-Aware Visual Reward Learning from Segmented Demonstrations [97.80917991633248]
This paper introduces REDS: REward learning from Demonstration with Demonstrations, a novel reward learning framework. We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals. Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World.
arXiv Detail & Related papers (2025-02-28T01:25:37Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Sample Efficient Imitation Learning via Reward Function Trained in Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations. In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z)
Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep Reinforcement Learning with Demonstration-like Sampled Exploration [7.930709072852582]
We propose a generic framework for Learning from Demonstration (LfD) based on actor-critic algorithms. We conduct experiments on 4 standard benchmark environments in Mujoco and 2 self-designed robotic environments.
arXiv Detail & Related papers (2021-09-27T12:42:05Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback [6.367592686247906]
We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback. We tested our proposed method in six physics-based control tasks.
arXiv Detail & Related papers (2021-04-14T02:58:51Z)
Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function. We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive. We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.