Environment Predictive Coding for Embodied Agents
- URL: http://arxiv.org/abs/2102.02337v1
- Date: Wed, 3 Feb 2021 23:43:16 GMT
- Title: Environment Predictive Coding for Embodied Agents
- Authors: Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen
Grauman
- Abstract summary: We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
- Score: 92.31905063609082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce environment predictive coding, a self-supervised approach to
learn environment-level representations for embodied agents. In contrast to
prior work on self-supervised learning for images, we aim to jointly encode a
series of images gathered by an agent as it moves about in 3D environments. We
learn these representations via a zone prediction task, where we intelligently
mask out portions of an agent's trajectory and predict them from the unmasked
portions, conditioned on the agent's camera poses. By learning such
representations on a collection of videos, we demonstrate successful transfer
to multiple downstream navigation-oriented tasks. Our experiments on the
photorealistic 3D environments of Gibson and Matterport3D show that our method
outperforms the state-of-the-art on challenging tasks with only a limited
budget of experience.
Related papers
- Behavioral Cloning via Search in Video PreTraining Latent Space [0.13999481573773073]
We formulate our control problem as a search problem over a dataset of experts' demonstrations.
We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model.
The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge.
arXiv Detail & Related papers (2022-12-27T00:20:37Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Self-Supervised Learning of Remote Sensing Scene Representations Using
Contrastive Multiview Coding [0.0]
We conduct an analysis of the applicability of self-supervised learning in remote sensing image classification.
We show that, for the downstream task of remote sensing image classification, using self-supervised pre-training can give better results than using supervised pre-training on images of natural scenes.
arXiv Detail & Related papers (2021-04-14T18:25:43Z) - Learning Affordance Landscapes for Interaction Exploration in 3D
Environments [101.90004767771897]
Embodied agents must be able to master how their environment works.
We introduce a reinforcement learning approach for exploration for interaction.
We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z) - Learning to Visually Navigate in Photorealistic Environments Without any
Supervision [37.22924101745505]
We introduce a novel approach for learning to navigate from image inputs without external supervision or reward.
Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals.
We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.
arXiv Detail & Related papers (2020-04-10T08:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.