Learning to Visually Navigate in Photorealistic Environments Without any
Supervision
- URL: http://arxiv.org/abs/2004.04954v1
- Date: Fri, 10 Apr 2020 08:59:32 GMT
- Title: Learning to Visually Navigate in Photorealistic Environments Without any
Supervision
- Authors: Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin,
Piotr Bojanowski
- Abstract summary: We introduce a novel approach for learning to navigate from image inputs without external supervision or reward.
Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals.
We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.
- Score: 37.22924101745505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning to navigate in a realistic setting where an agent must rely solely
on visual inputs is a challenging task, in part because the lack of position
information makes it difficult to provide supervision during training. In this
paper, we introduce a novel approach for learning to navigate from image inputs
without external supervision or reward. Our approach consists of three stages:
learning a good representation of first-person views, then learning to explore
using memory, and finally learning to navigate by setting its own goals. The
model is trained with intrinsic rewards only so that it can be applied to any
environment with image observations. We show the benefits of our approach by
training an agent to navigate challenging photo-realistic environments from the
Gibson dataset with RGB inputs only.
Related papers
- Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Using Navigational Information to Learn Visual Representations [7.747924294389427]
We show that using spatial and temporal information in the pretraining stage of contrastive learning can improve the performance of downstream classification.
This work reveals the effectiveness and efficiency of contextual information for improving representation learning.
arXiv Detail & Related papers (2022-02-10T20:17:55Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z) - Memory-Augmented Reinforcement Learning for Image-Goal Navigation [67.3963444878746]
We present a novel method that leverages a cross-episode memory to learn to navigate.
In order to avoid overfitting, we propose to use data augmentation on the RGB input during training.
We obtain this competitive performance from RGB input only, without access to additional sensors such as position or depth.
arXiv Detail & Related papers (2021-01-13T16:30:20Z) - Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding.
We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment.
We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z) - VisualEchoes: Spatial Image Representation Learning through Echolocation [97.23789910400387]
Several animal species (e.g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation.
We propose a novel interaction-based representation learning framework that learns useful visual features via echolocation.
Our work opens a new path for representation learning for embodied agents, where supervision comes from interacting with the physical world.
arXiv Detail & Related papers (2020-05-04T16:16:58Z) - Acceleration of Actor-Critic Deep Reinforcement Learning for Visual
Grasping in Clutter by State Representation Learning Based on Disentanglement
of a Raw Input Image [4.970364068620608]
Actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects.
We employ state representation learning (SRL), where we encode essential information first for subsequent use in RL.
We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation.
arXiv Detail & Related papers (2020-02-27T03:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.