Related papers: Long-Term Exploration in Persistent MDPs

Long-Term Exploration in Persistent MDPs

URL: http://arxiv.org/abs/2109.10173v1
Date: Tue, 21 Sep 2021 13:47:04 GMT
Title: Long-Term Exploration in Persistent MDPs
Authors: Leonid Ugadiarov, Alexey Skrynnik, Aleksandr I. Panov
Abstract summary: We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
Score: 68.8204255655161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge. At all used levels of the game, our agent outperforms or shows comparable results with state-of-the-art curiosity methods with knowledge-based intrinsic motivation: ICM and RND. An implementation of RbExplore can be found at https://github.com/cds-mipt/RbExplore.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z)
First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state. We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z)
When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. We refer to such exploration after a goal is reached as 'post-exploration' We introduce new methodology to adaptively decide when to post-explore and for how long to post-explore.
arXiv Detail & Related papers (2022-03-29T16:50:12Z)
Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning. The goal is selected from multiple projected state spaces via a normalized entropy-based technique. We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon. We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge. We present a novel approach that plans exploration actions far into the future by using a long-term visitation count. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.