RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training
- URL: http://arxiv.org/abs/2002.02200v1
- Date: Thu, 6 Feb 2020 11:16:24 GMT
- Title: RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training
- Authors: Jean Lahoud, Bernard Ghanem
- Abstract summary: We propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method.
In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors.
We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training.
- Score: 77.62171090230986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although well-known large-scale datasets, such as ImageNet, have driven image
understanding forward, most of these datasets require extensive manual
annotation and are thus not easily scalable. This limits the advancement of
image understanding techniques. The impact of these large-scale datasets can be
observed in almost every vision task and technique in the form of pre-training
for initialization. In this work, we propose an easily scalable and
self-supervised technique that can be used to pre-train any semantic RGB
segmentation method. In particular, our pre-training approach makes use of
automatically generated labels that can be obtained using depth sensors. These
labels, denoted by HN-labels, represent different height and normal patches,
which allow mining of local semantic information that is useful in the task of
semantic RGB segmentation. We show how our proposed self-supervised
pre-training with HN-labels can be used to replace ImageNet pre-training, while
using 25x less images and without requiring any manual labeling. We pre-train a
semantic segmentation network with our HN-labels, which resembles our final
task more than pre-training on a less related task, e.g. classification with
ImageNet. We evaluate on two datasets (NYUv2 and CamVid), and we show how the
similarity in tasks is advantageous not only in speeding up the pre-training
process, but also in achieving better final semantic segmentation accuracy than
ImageNet pre-training
Related papers
- Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - Enhancing Self-Supervised Learning for Remote Sensing with Elevation
Data: A Case Study with Scarce And High Level Semantic Labels [1.534667887016089]
This work proposes a hybrid unsupervised and supervised learning method to pre-train models applied in Earth observation downstream tasks.
We combine a contrastive approach to pre-train models with a pixel-wise regression pre-text task to predict coarse elevation maps.
arXiv Detail & Related papers (2023-04-13T23:01:11Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - From Explanations to Segmentation: Using Explainable AI for Image
Segmentation [1.8581514902689347]
We build upon the advances of the Explainable AI (XAI) community and extract a pixel-wise binary segmentation.
We show that we achieve similar results compared to an established U-Net segmentation architecture.
The proposed method can be trained in a weakly supervised fashion, as the training samples must be only labeled on image-level.
arXiv Detail & Related papers (2022-02-01T10:26:10Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals [78.12377360145078]
We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.
This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering.
In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
arXiv Detail & Related papers (2021-02-11T18:54:47Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.