JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical
Reinforcement Learning
- URL: http://arxiv.org/abs/2112.04907v1
- Date: Tue, 7 Dec 2021 09:24:49 GMT
- Title: JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical
Reinforcement Learning
- Authors: Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang
- Abstract summary: We propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.
Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task.
To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for
- Score: 13.57305458734617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rational behaviors in open-world games like Minecraft remains to be
challenging for Reinforcement Learning (RL) research due to the compound
challenge of partial observability, high-dimensional visual perception and
delayed reward. To address this, we propose JueWu-MC, a sample-efficient
hierarchical RL approach equipped with representation learning and imitation
learning to deal with perception and exploration. Specifically, our approach
includes two levels of hierarchy, where the high-level controller learns a
policy to control over options and the low-level workers learn to solve each
sub-task. To boost the learning of sub-tasks, we propose a combination of
techniques including 1) action-aware representation learning which captures
underlying relations between action and representation, 2) discriminator-based
self-imitation learning for efficient exploration, and 3) ensemble behavior
cloning with consistency filtering for policy robustness. Extensive experiments
show that JueWu-MC significantly improves sample efficiency and outperforms a
set of baselines by a large margin. Notably, we won the championship of the
NeurIPS MineRL 2021 research competition and achieved the highest performance
score ever.
Related papers
- Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL)
Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms.
We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - Strategically Efficient Exploration in Competitive Multi-agent
Reinforcement Learning [25.041622707261897]
This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings.
We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play.
To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games.
arXiv Detail & Related papers (2021-07-30T15:22:59Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Hierarchical Reinforcement Learning in StarCraft II with Human Expertise
in Subgoals Selection [13.136763521789307]
We propose a new method to integrate HRL, experience replay and effective subgoal selection through an implicit curriculum design based on human expertise.
Our method can achieve better sample efficiency than flat and end-to-end RL methods, and provides an effective method for explaining the agent's performance.
arXiv Detail & Related papers (2020-08-08T04:56:30Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.