Related papers: Never Give Up: Learning Directed Exploration Strategies

Never Give Up: Learning Directed Exploration Strategies

URL: http://arxiv.org/abs/2002.06038v1
Date: Fri, 14 Feb 2020 13:57:22 GMT
Title: Never Give Up: Learning Directed Exploration Strategies
Authors: Adri\`a Puigdom\`enech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Mart\'in Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell
Abstract summary: We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
Score: 63.19616370038824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control. We employ the framework of Universal Value Function Approximators (UVFA) to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration and exploitation. By using the same neural network for different degrees of exploration/exploitation, transfer is demonstrated from predominantly exploratory policies yielding effective exploitative policies. The proposed method can be incorporated to run with modern distributed RL agents that collect large amounts of experience from many actors running in parallel on separate environment instances. Our method doubles the performance of the base agent in all hard exploration in the Atari-57 suite while maintaining a very high score across the remaining games, obtaining a median human normalised score of 1344.0%. Notably, the proposed method is the first algorithm to achieve non-zero rewards (with a mean score of 8,400) in the game of Pitfall! without using demonstrations or hand-crafted features.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL) On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors. RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work. In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more. However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z)
SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game. Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning. The goal is selected from multiple projected state spaces via a normalized entropy-based technique. We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.