First-Explore, then Exploit: Meta-Learning Intelligent Exploration
- URL: http://arxiv.org/abs/2307.02276v1
- Date: Wed, 5 Jul 2023 13:20:21 GMT
- Title: First-Explore, then Exploit: Meta-Learning Intelligent Exploration
- Authors: Ben Norman, Jeff Clune
- Abstract summary: We argue a core barrier prohibiting many RL approaches from learning intelligent exploration is that the methods attempt to explore and exploit simultaneously.
We propose a novel meta-RL framework (First-Explore) with two policies: one policy learns to only explore and one policy learns to only exploit.
We demonstrate that First-Explore can learn intelligent exploration strategies such as exhaustive search and more, and that it outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward.
- Score: 4.676074196997298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard reinforcement learning (RL) agents never intelligently explore like
a human (i.e. by taking into account complex domain priors and previous
explorations). Even the most basic intelligent exploration strategies such as
exhaustive search are only inefficiently or poorly approximated by approaches
such as novelty search or intrinsic motivation, let alone more complicated
strategies like learning new skills, climbing stairs, opening doors, or
conducting experiments. This lack of intelligent exploration limits sample
efficiency and prevents solving hard exploration domains. We argue a core
barrier prohibiting many RL approaches from learning intelligent exploration is
that the methods attempt to explore and exploit simultaneously, which harms
both exploration and exploitation as the goals often conflict. We propose a
novel meta-RL framework (First-Explore) with two policies: one policy learns to
only explore and one policy learns to only exploit. Once trained, we can then
explore with the explore policy, for as long as desired, and then exploit based
on all the information gained during exploration. This approach avoids the
conflict of trying to do both exploration and exploitation at once. We
demonstrate that First-Explore can learn intelligent exploration strategies
such as exhaustive search and more, and that it outperforms dominant standard
RL and meta-RL approaches on domains where exploration requires sacrificing
reward. First-Explore is a significant step towards creating meta-RL algorithms
capable of learning human-level exploration which is essential to solve
challenging unseen hard-exploration domains.
Related papers
- An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework [2.823645435281551]
Non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals.
The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously.
arXiv Detail & Related papers (2023-05-02T11:08:05Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Exploration in Deep Reinforcement Learning: A Comprehensive Survey [24.252352133705735]
Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance.
DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings.
This paper provides a comprehensive survey on existing exploration methods in DRL and deep MARL.
arXiv Detail & Related papers (2021-09-14T13:16:33Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Exploration and Incentives in Reinforcement Learning [107.42240386544633]
We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
arXiv Detail & Related papers (2021-02-28T00:15:53Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Decoupling Exploration and Exploitation for Meta-Reinforcement Learning
without Sacrifices [132.49849640628727]
meta-reinforcement learning (meta-RL) builds agents that can quickly learn new tasks by leveraging prior experience on related tasks.
In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance.
We present DREAM, which avoids local optima in end-to-end training, without sacrificing optimal exploration.
arXiv Detail & Related papers (2020-08-06T17:57:36Z) - Intrinsic Exploration as Multi-Objective RL [29.124322674133]
Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse.
We propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives.
This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods.
arXiv Detail & Related papers (2020-04-06T02:37:29Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.