Latent World Models For Intrinsically Motivated Exploration
- URL: http://arxiv.org/abs/2010.02302v1
- Date: Mon, 5 Oct 2020 19:47:04 GMT
- Title: Latent World Models For Intrinsically Motivated Exploration
- Authors: Aleksandr Ermolov, Nicu Sebe
- Abstract summary: We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
- Score: 140.21871701134626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we consider partially observable environments with sparse
rewards. We present a self-supervised representation learning method for
image-based observations, which arranges embeddings respecting temporal
distance of observations. This representation is empirically robust to
stochasticity and suitable for novelty detection from the error of a predictive
forward model. We consider episodic and life-long uncertainties to guide the
exploration. We propose to estimate the missing information about the
environment with the world model, which operates in the learned latent space.
As a motivation of the method, we analyse the exploration problem in a tabular
Partially Observable Labyrinth. We demonstrate the method on image-based hard
exploration environments from the Atari benchmark and report significant
improvement with respect to prior work. The source code of the method and all
the experiments is available at https://github.com/htdt/lwm.
Related papers
- Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement
Learning [20.0888026410406]
We show that counts can be derived by averaging samples from the Rademacher distribution.
We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work.
arXiv Detail & Related papers (2023-06-05T18:56:48Z) - Learning to Explore Informative Trajectories and Samples for Embodied
Perception [24.006056116516618]
Generalizing perception models to unseen embodied tasks is insufficiently studied.
We build a 3D semantic distribution map to train the exploration policy self-supervised.
With the explored informative trajectories, we propose to select hard samples on trajectories based on the semantic distribution uncertainty.
Experiments show that the perception model fine-tuned with our method outperforms the baselines trained with other exploration policies.
arXiv Detail & Related papers (2023-03-20T08:20:04Z) - TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals.
Our approach locally modulates the saliency predictions by combining the learned temporal maps.
Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z) - Self-supervised Sequential Information Bottleneck for Robust Exploration
in Deep Reinforcement Learning [28.75574762244266]
In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations.
For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty.
arXiv Detail & Related papers (2022-09-12T15:41:10Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration [47.01485765231528]
Active visual exploration aims to assist an agent with a limited field of view to understand its environment based on partial observations.
We propose the Glimpse-Attend-and-Explore model which employs self-attention to guide the visual exploration instead of task-specific uncertainty maps.
Our model provides encouraging results while being less dependent on dataset bias in driving the exploration.
arXiv Detail & Related papers (2021-08-26T11:41:03Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - Novelty Search in Representational Space for Sample Efficient
Exploration [38.2027946450689]
We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives.
Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty.
We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards.
arXiv Detail & Related papers (2020-09-28T18:51:52Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.