When to Go, and When to Explore: The Benefit of Post-Exploration in
Intrinsic Motivation
- URL: http://arxiv.org/abs/2203.16311v1
- Date: Tue, 29 Mar 2022 16:50:12 GMT
- Title: When to Go, and When to Explore: The Benefit of Post-Exploration in
Intrinsic Motivation
- Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss and Aske Plaat
- Abstract summary: Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
We refer to such exploration after a goal is reached as 'post-exploration'
We introduce new methodology to adaptively decide when to post-explore and for how long to post-explore.
- Score: 7.021281655855703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Go-Explore achieved breakthrough performance on challenging reinforcement
learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that
successful exploration requires an agent to first return to an interesting
state ('Go'), and only then explore into unknown terrain ('Explore'). We refer
to such exploration after a goal is reached as 'post-exploration'. In this
paper we present a systematic study of post-exploration, answering open
questions that the Go-Explore paper did not answer yet. First, we study the
isolated potential of post-exploration, by turning it on and off within the
same algorithm. Subsequently, we introduce new methodology to adaptively decide
when to post-explore and for how long to post-explore. Experiments on a range
of MiniGrid environments show that post-exploration indeed boosts performance
(with a bigger impact than tuning regular exploration parameters), and this
effect is further enhanced by adaptively deciding when and for how long to
post-explore. In short, our work identifies adaptive post-exploration as a
promising direction for RL exploration research.
Related papers
- Deterministic Exploration via Stationary Bellman Error Maximization [6.474106100512158]
Exploration is a crucial and distinctive aspect of reinforcement learning (RL)
In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy.
Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
arXiv Detail & Related papers (2024-10-31T11:46:48Z) - An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework [2.823645435281551]
Non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals.
The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously.
arXiv Detail & Related papers (2023-05-02T11:08:05Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - Generative Adversarial Exploration for Reinforcement Learning [48.379457575356454]
In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in reinforcement learning (RL)
In our experiments, we apply GAEX to the game Venture, Montezuma's Revenge and Super Mario Bros.
To our knowledge, this is the first work to employ GAN in RL exploration problems.
arXiv Detail & Related papers (2022-01-27T17:34:47Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon.
We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.