BYOL-Explore: Exploration by Bootstrapped Prediction
- URL: http://arxiv.org/abs/2206.08332v1
- Date: Thu, 16 Jun 2022 17:36:15 GMT
- Title: BYOL-Explore: Exploration by Bootstrapped Prediction
- Authors: Zhaohan Daniel Guo, Shantanu Thakoor, Miruna P\^islar, Bernardo Avila
Pires, Florent Altch\'e, Corentin Tallec, Alaa Saade, Daniele Calandriello,
Jean-Bastien Grill, Yunhao Tang, Michal Valko, R\'emi Munos, Mohammad
Gheshlaghi Azar, Bilal Piot
- Abstract summary: BYOL-Explore is a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.
We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark.
- Score: 49.221173336814225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present BYOL-Explore, a conceptually simple yet general approach for
curiosity-driven exploration in visually-complex environments. BYOL-Explore
learns a world representation, the world dynamics, and an exploration policy
all-together by optimizing a single prediction loss in the latent space with no
additional auxiliary objective. We show that BYOL-Explore is effective in
DM-HARD-8, a challenging partially-observable continuous-action
hard-exploration benchmark with visually-rich 3-D environments. On this
benchmark, we solve the majority of the tasks purely through augmenting the
extrinsic reward with BYOL-Explore s intrinsic reward, whereas prior work could
only get off the ground with human demonstrations. As further evidence of the
generality of BYOL-Explore, we show that it achieves superhuman performance on
the ten hardest exploration games in Atari while having a much simpler design
than other competitive agents.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models [5.404186221463082]
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems.
We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore.
IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
arXiv Detail & Related papers (2024-05-24T01:45:27Z) - Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information.
We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Benchmarking the Spectrum of Agent Capabilities [7.088856621650764]
We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment.
Agents learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements.
We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents.
arXiv Detail & Related papers (2021-09-14T15:49:31Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon.
We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.