Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations
- URL: http://arxiv.org/abs/2006.09939v1
- Date: Wed, 17 Jun 2020 15:38:40 GMT
- Title: Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations
- Authors: Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov,
Vasilii Davydov, Aleksandr I. Panov
- Abstract summary: In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
- Score: 55.41644538483948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, deep reinforcement learning (RL) shows impressive results in
complex gaming and robotic environments. Often these results are achieved at
the expense of huge computational costs and require an incredible number of
episodes of interaction between the agent and the environment. There are two
main approaches to improving the sample efficiency of reinforcement learning
methods - using hierarchical methods and expert demonstrations. In this paper,
we propose a combination of these approaches that allow the agent to use
low-quality demonstrations in complex vision-based environments with multiple
related goals. Our forgetful experience replay (ForgER) algorithm effectively
handles errors in expert data and reduces quality losses when adapting the
action space and states representation to the agent's capabilities. Our
proposed goal-oriented structuring of replay buffer allows the agent to
automatically highlight sub-goals for solving complex hierarchical tasks in
demonstrations. Our method is universal and can be integrated into various
off-policy methods. It surpasses all known existing state-of-the-art RL methods
using expert demonstrations on various model environments. The solution based
on our algorithm beats all the solutions for the famous MineRL competition and
allows the agent to mine a diamond in the Minecraft environment.
Related papers
- DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction [4.813546138483559]
Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data.
In this paper, we explore how the agent's knowledge of its shape can improve the sample efficiency of visual RL methods.
We propose a novel method, Disentangled Environment and Agent Representations, that uses the segmentation mask of the agent as supervision.
arXiv Detail & Related papers (2024-06-30T09:15:21Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - Embedding Contextual Information through Reward Shaping in Multi-Agent
Learning: A Case Study from Google Football [0.0]
We create a novel reward shaping method by embedding contextual information in reward function.
We demonstrate this in the Google Research Football (GRF) environment.
Experiment results prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
arXiv Detail & Related papers (2023-03-25T10:21:13Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep
Reinforcement Learning with Demonstration-like Sampled Exploration [7.930709072852582]
We propose a generic framework for Learning from Demonstration (LfD) based on actor-critic algorithms.
We conduct experiments on 4 standard benchmark environments in Mujoco and 2 self-designed robotic environments.
arXiv Detail & Related papers (2021-09-27T12:42:05Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.