Sample Efficient Reinforcement Learning through Learning from
Demonstrations in Minecraft
- URL: http://arxiv.org/abs/2003.06066v1
- Date: Thu, 12 Mar 2020 23:46:16 GMT
- Title: Sample Efficient Reinforcement Learning through Learning from
Demonstrations in Minecraft
- Authors: Christian Scheller, Yanick Schraner and Manfred Vogel
- Abstract summary: We show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction.
Our solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.
- Score: 4.3952888284140785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sample inefficiency of deep reinforcement learning methods is a major
obstacle for their use in real-world applications. In this work, we show how
human demonstrations can improve final performance of agents on the Minecraft
minigame ObtainDiamond with only 8M frames of environment interaction. We
propose a training procedure where policy networks are first trained on human
data and later fine-tuned by reinforcement learning. Using a policy
exploitation mechanism, experience replay and an additional loss against
catastrophic forgetting, our best agent was able to achieve a mean score of 48.
Our proposed solution placed 3rd in the NeurIPS MineRL Competition for
Sample-Efficient Reinforcement Learning.
Related papers
- "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Accelerating Self-Imitation Learning from Demonstrations via Policy
Constraints and Q-Ensemble [6.861783783234304]
We propose a learning from demonstrations method named A-SILfD.
A-SILfD treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement.
In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training.
arXiv Detail & Related papers (2022-12-07T10:29:13Z) - Minimizing Human Assistance: Augmenting a Single Demonstration for Deep
Reinforcement Learning [0.0]
We use a single human example collected through a simple-to-use virtual reality simulation to assist with RL training.
Our method augments a single demonstration to generate numerous human-like demonstrations.
Despite learning from a human example, the agent is not constrained to human-level performance.
arXiv Detail & Related papers (2022-09-22T19:04:43Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Perceiving the World: Question-guided Reinforcement Learning for
Text-based Games [64.11746320061965]
This paper introduces world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment.
We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency.
arXiv Detail & Related papers (2022-03-20T04:23:57Z) - Mixing Human Demonstrations with Self-Exploration in Experience Replay
for Deep Reinforcement Learning [2.8783296093434148]
We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning.
Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.
arXiv Detail & Related papers (2021-07-14T16:55:30Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks [70.56451186797436]
We study how to use meta-reinforcement learning to solve the bulk of the problem in simulation.
We demonstrate our approach by training an agent to successfully perform challenging real-world insertion tasks.
arXiv Detail & Related papers (2020-04-29T18:00:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.