Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation
- URL: http://arxiv.org/abs/2012.10782v2
- Date: Mon, 5 Apr 2021 09:46:36 GMT
- Title: Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation
- Authors: Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian K\"oring, Suman Saha, Luc
Van Gool
- Abstract summary: We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
- Score: 90.87105131054419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep networks for semantic segmentation requires large amounts of
labeled training data, which presents a major challenge in practice, as
labeling segmentation masks is a highly labor-intensive process. To address
this issue, we present a framework for semi-supervised semantic segmentation,
which is enhanced by self-supervised monocular depth estimation from unlabeled
image sequences. In particular, we propose three key contributions: (1) We
transfer knowledge from features learned during self-supervised depth
estimation to semantic segmentation, (2) we implement a strong data
augmentation by blending images and labels using the geometry of the scene, and
(3) we utilize the depth feature diversity as well as the level of difficulty
of learning depth in a student-teacher framework to select the most useful
samples to be annotated for semantic segmentation. We validate the proposed
model on the Cityscapes dataset, where all three modules demonstrate
significant performance gains, and we achieve state-of-the-art results for
semi-supervised semantic segmentation. The implementation is available at
https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.
Related papers
- S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving [12.406655155106424]
We propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training.
Our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals.
Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs.
Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level.
arXiv Detail & Related papers (2024-10-30T15:00:06Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling [14.88236554564287]
In this work, we build upon advances in unsupervised learning by incorporating information about the structure of a scene into the training process.
We achieve this by (1) learning depth-feature correlation by spatially correlate the feature maps with the depth maps to induce knowledge about the structure of the scene.
We then implement farthest-point sampling to more effectively select relevant features by utilizing 3D sampling techniques on depth information of the scene.
arXiv Detail & Related papers (2023-09-21T11:47:01Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with
Self-Supervised Depth Estimation [94.16816278191477]
We present a framework for semi-adaptive and domain-supervised semantic segmentation.
It is enhanced by self-supervised monocular depth estimation trained only on unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset.
arXiv Detail & Related papers (2021-08-28T01:33:38Z) - Point-supervised Segmentation of Microscopy Images and Volumes via
Objectness Regularization [2.243486411968779]
This work enables the training of semantic segmentation networks on images with only a single point for training per instance.
We achieve competitive results against the state-of-the-art in point-supervised semantic segmentation on challenging datasets in digital pathology.
arXiv Detail & Related papers (2021-03-09T18:40:00Z) - A Three-Stage Self-Training Framework for Semi-Supervised Semantic
Segmentation [0.9786690381850356]
We propose a holistic solution framed as a three-stage self-training framework for semantic segmentation.
The key idea of our technique is the extraction of the pseudo-masks statistical information.
We then decrease the uncertainty of the pseudo-masks using a multi-task model that enforces consistency.
arXiv Detail & Related papers (2020-12-01T21:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.