Time-Myopic Go-Explore: Learning A State Representation for the
Go-Explore Paradigm
- URL: http://arxiv.org/abs/2301.05635v1
- Date: Fri, 13 Jan 2023 16:13:44 GMT
- Title: Time-Myopic Go-Explore: Learning A State Representation for the
Go-Explore Paradigm
- Authors: Marc H\"oftmann, Jan Robine, Stefan Harmeling
- Abstract summary: We introduce a novel time-myopic state representation that clusters temporal close states together.
We demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation.
Our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari)
- Score: 0.5156484100374059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Very large state spaces with a sparse reward signal are difficult to explore.
The lack of a sophisticated guidance results in a poor performance for numerous
reinforcement learning algorithms. In these cases, the commonly used random
exploration is often not helpful. The literature shows that this kind of
environments require enormous efforts to systematically explore large chunks of
the state space. Learned state representations can help here to improve the
search by providing semantic context and build a structure on top of the raw
observations. In this work we introduce a novel time-myopic state
representation that clusters temporal close states together while providing a
time prediction capability between them. By adapting this model to the
Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned
state representation that reliably estimates novelty instead of using the
hand-crafted representation heuristic. Our method shows an improved solution
for the detachment problem which still remains an issue at the Go-Explore
Exploration Phase. We provide evidence that our proposed method covers the
entire state space with respect to all possible time trajectories without
causing disadvantageous conflict-overlaps in the cell archive. Analogous to
native Go-Explore, our approach is evaluated on the hard exploration
environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to
validate its capabilities on difficult tasks. Our experiments show that
time-myopic Go-Explore is an effective alternative for the domain-engineered
heuristic while also being more general. The source code of the method is
available on GitHub.
Related papers
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models [5.404186221463082]
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems.
We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore.
IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
arXiv Detail & Related papers (2024-05-24T01:45:27Z) - Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs [23.584313644411967]
We study the problem of discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant.
Our results include theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms.
We show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.
arXiv Detail & Related papers (2024-04-22T19:46:16Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - First return, then explore [18.876005532689234]
Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring.
Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games.
We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
arXiv Detail & Related papers (2020-04-27T16:31:26Z) - Sparse Graphical Memory for Robust Planning [93.39298821537197]
We introduce Sparse Graphical Memory (SGM), a new data structure that stores states and feasible transitions in a sparse memory.
SGM aggregates states according to a novel two-way consistency objective, adapting classic state aggregation criteria to goal-conditioned RL.
We show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks.
arXiv Detail & Related papers (2020-03-13T17:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.