Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning
to Segment Driveability in Egocentric Images
- URL: http://arxiv.org/abs/2109.07245v1
- Date: Wed, 15 Sep 2021 12:25:56 GMT
- Title: Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning
to Segment Driveability in Egocentric Images
- Authors: Galadrielle Humblot-Renaux, Letizia Marchegiani, Thomas B. Moeslund
and Rikke Gade
- Abstract summary: This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera.
We segment egocentric images directly in terms of how a robot can navigate in them, and tailor the learning problem to an autonomous navigation task.
We present a generic and scalable affordance-based definition consisting of 3 driveability levels which can be applied to arbitrary scenes.
- Score: 25.350677396144075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work tackles scene understanding for outdoor robotic navigation, solely
relying on images captured by an on-board camera. Conventional visual scene
understanding interprets the environment based on specific descriptive
categories. However, such a representation is not directly interpretable for
decision-making and constrains robot operation to a specific domain. Thus, we
propose to segment egocentric images directly in terms of how a robot can
navigate in them, and tailor the learning problem to an autonomous navigation
task. Building around an image segmentation network, we present a generic and
scalable affordance-based definition consisting of 3 driveability levels which
can be applied to arbitrary scenes. By encoding these levels with soft ordinal
labels, we incorporate inter-class distances during learning which improves
segmentation compared to standard one-hot labelling. In addition, we propose a
navigation-oriented pixel-wise loss weighting method which assigns higher
importance to safety-critical areas. We evaluate our approach on large-scale
public image segmentation datasets spanning off-road and urban scenes. In a
zero-shot cross-dataset generalization experiment, we show that our affordance
learning scheme can be applied across a diverse mix of datasets and improves
driveability estimation in unseen environments compared to general-purpose,
single-dataset segmentation.
Related papers
- Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for
Mobile Agents via Unsupervised Contrastive Learning [93.6645991946674]
We introduce panoramic panoptic segmentation, as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to a mobile agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2022-06-21T20:07:15Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Polyline Based Generative Navigable Space Segmentation for Autonomous
Visual Navigation [57.3062528453841]
We propose a representation-learning-based framework to enable robots to learn the navigable space segmentation in an unsupervised manner.
We show that the proposed PSV-Nets can learn the visual navigable space with high accuracy, even without any single label.
arXiv Detail & Related papers (2021-10-29T19:50:48Z) - SIGN: Spatial-information Incorporated Generative Network for
Generalized Zero-shot Semantic Segmentation [22.718908677552196]
zero-shot semantic segmentation predicts a class label at the pixel level instead of the image level.
Relative Positional integrates spatial information at the feature level and can handle arbitrary image sizes.
Anneal Self-Training can automatically assign different importance to pseudo-labels.
arXiv Detail & Related papers (2021-08-27T22:18:24Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.