Related papers: First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

URL: http://arxiv.org/abs/2212.03251v1
Date: Tue, 6 Dec 2022 18:56:47 GMT
Title: First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation
Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat
Abstract summary: Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state. We refer to such exploration after a goal is reached as 'post-exploration'
Score: 7.021281655855703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Deterministic Exploration via Stationary Bellman Error Maximization [6.474106100512158]
Exploration is a crucial and distinctive aspect of reinforcement learning (RL) In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy. Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
arXiv Detail & Related papers (2024-10-31T11:46:48Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models [5.404186221463082]
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems. We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore. IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
arXiv Detail & Related papers (2024-05-24T01:45:27Z)
On the Importance of Exploration for Generalization in Reinforcement Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty. Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z)
Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z)
BYOL-Explore: Exploration by Bootstrapped Prediction [49.221173336814225]
BYOL-Explore is a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark.
arXiv Detail & Related papers (2022-06-16T17:36:15Z)
When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. We refer to such exploration after a goal is reached as 'post-exploration' We introduce new methodology to adaptively decide when to post-explore and for how long to post-explore.
arXiv Detail & Related papers (2022-03-29T16:50:12Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR) The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z)
First return, then explore [18.876005532689234]
Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring. Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games. We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
arXiv Detail & Related papers (2020-04-27T16:31:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.