First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation
- URL: http://arxiv.org/abs/2212.03251v1
- Date: Tue, 6 Dec 2022 18:56:47 GMT
- Title: First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation
- Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
- Abstract summary: Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
- Score: 7.021281655855703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Go-Explore achieved breakthrough performance on challenging reinforcement
learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that
successful exploration requires an agent to first return to an interesting
state ('Go'), and only then explore into unknown terrain ('Explore'). We refer
to such exploration after a goal is reached as 'post-exploration'. In this
paper, we present a clear ablation study of post-exploration in a general
intrinsically motivated goal exploration process (IMGEP) framework, that the
Go-Explore paper did not show. We study the isolated potential of
post-exploration, by turning it on and off within the same algorithm under both
tabular and deep RL settings on both discrete navigation and continuous control
tasks. Experiments on a range of MiniGrid and Mujoco environments show that
post-exploration indeed helps IMGEP agents reach more diverse states and boosts
their performance. In short, our work suggests that RL researchers should
consider to use post-exploration in IMGEP when possible since it is effective,
method-agnostic and easy to implement.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models [5.404186221463082]
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems.
We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore.
IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
arXiv Detail & Related papers (2024-05-24T01:45:27Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information.
We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z) - BYOL-Explore: Exploration by Bootstrapped Prediction [49.221173336814225]
BYOL-Explore is a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.
We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark.
arXiv Detail & Related papers (2022-06-16T17:36:15Z) - When to Go, and When to Explore: The Benefit of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
We refer to such exploration after a goal is reached as 'post-exploration'
We introduce new methodology to adaptively decide when to post-explore and for how long to post-explore.
arXiv Detail & Related papers (2022-03-29T16:50:12Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - First return, then explore [18.876005532689234]
Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring.
Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games.
We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
arXiv Detail & Related papers (2020-04-27T16:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.