Never Give Up: Learning Directed Exploration Strategies
- URL: http://arxiv.org/abs/2002.06038v1
- Date: Fri, 14 Feb 2020 13:57:22 GMT
- Title: Never Give Up: Learning Directed Exploration Strategies
- Authors: Adri\`a Puigdom\`enech Badia, Pablo Sprechmann, Alex Vitvitskyi,
Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Mart\'in
Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell
- Abstract summary: We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
- Score: 63.19616370038824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a reinforcement learning agent to solve hard exploration games by
learning a range of directed exploratory policies. We construct an episodic
memory-based intrinsic reward using k-nearest neighbors over the agent's recent
experience to train the directed exploratory policies, thereby encouraging the
agent to repeatedly revisit all states in its environment. A self-supervised
inverse dynamics model is used to train the embeddings of the nearest neighbour
lookup, biasing the novelty signal towards what the agent can control. We
employ the framework of Universal Value Function Approximators (UVFA) to
simultaneously learn many directed exploration policies with the same neural
network, with different trade-offs between exploration and exploitation. By
using the same neural network for different degrees of
exploration/exploitation, transfer is demonstrated from predominantly
exploratory policies yielding effective exploitative policies. The proposed
method can be incorporated to run with modern distributed RL agents that
collect large amounts of experience from many actors running in parallel on
separate environment instances. Our method doubles the performance of the base
agent in all hard exploration in the Atari-57 suite while maintaining a very
high score across the remaining games, obtaining a median human normalised
score of 1344.0%. Notably, the proposed method is the first algorithm to
achieve non-zero rewards (with a mean score of 8,400) in the game of Pitfall!
without using demonstrations or hand-crafted features.
Related papers
- Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work.
In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more.
However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z) - SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game.
Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy.
We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.