Related papers: Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

URL: http://arxiv.org/abs/2301.05635v1
Date: Fri, 13 Jan 2023 16:13:44 GMT
Title: Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm
Authors: Marc H\"oftmann, Jan Robine, Stefan Harmeling
Abstract summary: We introduce a novel time-myopic state representation that clusters temporal close states together. We demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation. Our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari)
Score: 0.5156484100374059
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation heuristic. Our method shows an improved solution for the detachment problem which still remains an issue at the Go-Explore Exploration Phase. We provide evidence that our proposed method covers the entire state space with respect to all possible time trajectories without causing disadvantageous conflict-overlaps in the cell archive. Analogous to native Go-Explore, our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to validate its capabilities on difficult tasks. Our experiments show that time-myopic Go-Explore is an effective alternative for the domain-engineered heuristic while also being more general. The source code of the method is available on GitHub.

Related papers

Adventurer: Exploration with BiGAN for Deep Reinforcement Learning [4.902161835372679]
We show that BiGAN performs well in estimating state novelty for complex observations. Our results show that Adventurer produces competitive results on a range of popular benchmark tasks.
arXiv Detail & Related papers (2025-03-24T12:13:24Z)
Map Prediction and Generative Entropy for Multi-Agent Exploration [37.938606877112]
We develop a map predictor that inpaints the unknown space in a multi-agent 2D occupancy map during an exploration mission. We identify areas that exhibit high uncertainty in the prediction, which we formalize with the concept of generative entropy. Our results demonstrate that by using our new task ranking method, we can predict a correct scene significantly faster than with a traditional information-guided method.
arXiv Detail & Related papers (2025-01-22T19:40:04Z)
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models [5.404186221463082]
Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems. We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore. IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
arXiv Detail & Related papers (2024-05-24T01:45:27Z)
Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs [23.584313644411967]
We study the problem of discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant. Our results include theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms. We show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.
arXiv Detail & Related papers (2024-04-22T19:46:16Z)
Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting. We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations. We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z)
Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z)
First return, then explore [18.876005532689234]
Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring. Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games. We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
arXiv Detail & Related papers (2020-04-27T16:31:26Z)
Sparse Graphical Memory for Robust Planning [93.39298821537197]
We introduce Sparse Graphical Memory (SGM), a new data structure that stores states and feasible transitions in a sparse memory. SGM aggregates states according to a novel two-way consistency objective, adapting classic state aggregation criteria to goal-conditioned RL. We show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks.
arXiv Detail & Related papers (2020-03-13T17:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.