First return, then explore
- URL: http://arxiv.org/abs/2004.12919v6
- Date: Thu, 16 Sep 2021 17:50:10 GMT
- Title: First return, then explore
- Authors: Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley and
Jeff Clune
- Abstract summary: Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring.
Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games.
We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
- Score: 18.876005532689234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The promise of reinforcement learning is to solve complex sequential decision
problems autonomously by specifying a high-level reward function only. However,
reinforcement learning algorithms struggle when, as is often the case, simple
and intuitive rewards provide sparse and deceptive feedback. Avoiding these
pitfalls requires thoroughly exploring the environment, but creating algorithms
that can do so remains one of the central challenges of the field. We
hypothesise that the main impediment to effective exploration originates from
algorithms forgetting how to reach previously visited states ("detachment") and
from failing to first return to a state before exploring from it
("derailment"). We introduce Go-Explore, a family of algorithms that addresses
these two challenges directly through the simple principles of explicitly
remembering promising states and first returning to such states before
intentionally exploring. Go-Explore solves all heretofore unsolved Atari games
and surpasses the state of the art on all hard-exploration games, with orders
of magnitude improvements on the grand challenges Montezuma's Revenge and
Pitfall. We also demonstrate the practical potential of Go-Explore on a
sparse-reward pick-and-place robotics task. Additionally, we show that adding a
goal-conditioned policy can further improve Go-Explore's exploration efficiency
and enable it to handle stochasticity throughout training. The substantial
performance gains from Go-Explore suggest that the simple principles of
remembering states, returning to them, and exploring from them are a powerful
and general approach to exploration, an insight that may prove critical to the
creation of truly intelligent learning agents.
Related papers
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models [5.404186221463082]
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems.
We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore.
IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
arXiv Detail & Related papers (2024-05-24T01:45:27Z) - Curiosity-Driven Reinforcement Learning based Low-Level Flight Control [95.42181254494287]
This work proposes an algorithm based on the drive of curiosity for autonomous learning to control by generating proper motor speeds from odometry data.
We ran tests using on-policy, off-policy, on-policy plus curiosity, and the proposed algorithm and visualized the effect of curiosity in evolving exploration patterns.
arXiv Detail & Related papers (2023-07-28T11:46:28Z) - Time-Myopic Go-Explore: Learning A State Representation for the
Go-Explore Paradigm [0.5156484100374059]
We introduce a novel time-myopic state representation that clusters temporal close states together.
We demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation.
Our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari)
arXiv Detail & Related papers (2023-01-13T16:13:44Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - Generative Adversarial Exploration for Reinforcement Learning [48.379457575356454]
In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in reinforcement learning (RL)
In our experiments, we apply GAEX to the game Venture, Montezuma's Revenge and Super Mario Bros.
To our knowledge, this is the first work to employ GAN in RL exploration problems.
arXiv Detail & Related papers (2022-01-27T17:34:47Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.