Learning with a Mole: Transferable latent spatial representations for
navigation without reconstruction
- URL: http://arxiv.org/abs/2306.03857v2
- Date: Fri, 29 Sep 2023 12:37:36 GMT
- Title: Learning with a Mole: Transferable latent spatial representations for
navigation without reconstruction
- Authors: Guillaume Bono, Leonid Antsfeld, Assem Sadek, Gianluca Monaci,
Christian Wolf
- Abstract summary: In most end-to-end learning approaches the representation is latent and usually does not have a clearly defined interpretation.
In this work we propose to learn an actionable representation of the scene independently of the targeted downstream task.
The learned representation is optimized by a blind auxiliary agent trained to navigate with it on multiple short sub episodes branching out from a waypoint.
- Score: 12.845774297648736
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Agents navigating in 3D environments require some form of memory, which
should hold a compact and actionable representation of the history of
observations useful for decision taking and planning. In most end-to-end
learning approaches the representation is latent and usually does not have a
clearly defined interpretation, whereas classical robotics addresses this with
scene reconstruction resulting in some form of map, usually estimated with
geometry and sensor models and/or learning. In this work we propose to learn an
actionable representation of the scene independently of the targeted downstream
task and without explicitly optimizing reconstruction. The learned
representation is optimized by a blind auxiliary agent trained to navigate with
it on multiple short sub episodes branching out from a waypoint and, most
importantly, without any direct visual observation. We argue and show that the
blindness property is important and forces the (trained) latent representation
to be the only means for planning. With probing experiments we show that the
learned representation optimizes navigability and not reconstruction. On
downstream tasks we show that it is robust to changes in distribution, in
particular the sim2real gap, which we evaluate with a real physical robot in a
real office building, significantly improving performance.
Related papers
- What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - MaAST: Map Attention with Semantic Transformersfor Efficient Visual
Navigation [4.127128889779478]
This work focuses on performing better or comparable to the existing learning-based solutions for visual navigation for autonomous agents.
We propose a method to encode vital scene semantics into a semantically informed, top-down egocentric map representation.
We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-03-21T12:01:23Z) - S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via
Multi-View Consistency [11.357804868755155]
We advocate semantic 3D keypoints as a visual representation, and present a semi-supervised training objective.
Unlike local texture-based approaches, our model integrates contextual information from a large area.
We demonstrate that this ability to locate semantic keypoints enables high level scripting of human understandable behaviours.
arXiv Detail & Related papers (2020-09-30T14:44:54Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z) - Mutual Information Maximization for Robust Plannable Representations [82.83676853746742]
We present MIRO, an information theoretic representational learning algorithm for model-based reinforcement learning.
We show that our approach is more robust than reconstruction objectives in the presence of distractors and cluttered scenes.
arXiv Detail & Related papers (2020-05-16T21:58:47Z) - Semantically-Guided Representation Learning for Self-Supervised
Monocular Depth [40.49380547487908]
We propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning.
Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
arXiv Detail & Related papers (2020-02-27T18:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.