BeBold: Exploration Beyond the Boundary of Explored Regions
- URL: http://arxiv.org/abs/2012.08621v1
- Date: Tue, 15 Dec 2020 21:26:54 GMT
- Title: BeBold: Exploration Beyond the Boundary of Explored Regions
- Authors: Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph
E. Gonzalez, Yuandong Tian
- Abstract summary: In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
- Score: 66.88415950549556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient exploration under sparse rewards remains a key challenge in deep
reinforcement learning. To guide exploration, previous work makes extensive use
of intrinsic reward (IR). There are many heuristics for IR, including
visitation counts, curiosity, and state-difference. In this paper, we analyze
the pros and cons of each method and propose the regulated difference of
inverse visitation counts as a simple but effective criterion for IR. The
criterion helps the agent explore Beyond the Boundary of explored regions and
mitigates common issues in count-based methods, such as short-sightedness and
detachment. The resulting method, BeBold, solves the 12 most challenging
procedurally-generated tasks in MiniGrid with just 120M environment steps,
without any curriculum learning. In comparison, the previous SoTA only solves
50% of the tasks. BeBold also achieves SoTA on multiple tasks in NetHack, a
popular rogue-like game that contains more challenging procedurally-generated
environments.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - Exploration via Elliptical Episodic Bonuses [22.404871878551354]
We introduce Exploration via Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses to continuous state spaces.
Our method sets a new state-of-the-art across 16 challenging tasks from the MiniHack suite, without requiring task-specific inductive biases.
E3B also matches existing methods on sparse reward, pixel-based VizDoom environments, and outperforms existing methods in reward-free exploration on Habitat.
arXiv Detail & Related papers (2022-10-11T22:10:23Z) - Exploration in Deep Reinforcement Learning: A Survey [4.066140143829243]
Exploration techniques are of primary importance when solving sparse reward problems.
In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly.
This review provides a comprehensive overview of existing exploration approaches.
arXiv Detail & Related papers (2022-05-02T12:03:44Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Rank the Episodes: A Simple Approach for Exploration in
Procedurally-Generated Environments [66.80667987347151]
Methods based on intrinsic rewards often fall short in procedurally-generated environments.
We introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments.
We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks.
arXiv Detail & Related papers (2021-01-20T14:22:01Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.