SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images
- URL: http://arxiv.org/abs/2101.07422v1
- Date: Tue, 19 Jan 2021 02:41:03 GMT
- Title: SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images
- Authors: Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, Jie Zhou
- Abstract summary: We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
- Score: 94.36401543589523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth estimation and semantic segmentation play essential roles in scene
understanding. The state-of-the-art methods employ multi-task learning to
simultaneously learn models for these two tasks at the pixel-wise level. They
usually focus on sharing the common features or stitching feature maps from the
corresponding branches. However, these methods lack in-depth consideration on
the correlation of the geometric cues and the scene parsing. In this paper, we
first introduce the concept of semantic objectness to exploit the geometric
relationship of these two tasks through an analysis of the imaging process,
then propose a Semantic Object Segmentation and Depth Estimation Network
(SOSD-Net) based on the objectness assumption. To the best of our knowledge,
SOSD-Net is the first network that exploits the geometry constraint for
simultaneous monocular depth estimation and semantic segmentation. In addition,
considering the mutual implicit relationship between these two tasks, we
exploit the iterative idea from the expectation-maximization algorithm to train
the proposed network more effectively. Extensive experimental results on the
Cityscapes and NYU v2 dataset are presented to demonstrate the superior
performance of the proposed approach.
Related papers
- SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images [4.269350826756809]
This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera.
The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency.
The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks.
arXiv Detail & Related papers (2024-03-15T20:04:27Z) - Hybridnet for depth estimation and semantic segmentation [2.781817315328713]
depth estimation and semantic segmentation are addressed together from a single input image through a hybrid convolutional network.
The proposed HybridNet improves the features extraction by separating the relevant features for one task from those which are relevant for both.
arXiv Detail & Related papers (2024-02-09T16:52:45Z) - Towards Deeply Unified Depth-aware Panoptic Segmentation with
Bi-directional Guidance Learning [63.63516124646916]
We propose a deeply unified framework for depth-aware panoptic segmentation.
We propose a bi-directional guidance learning approach to facilitate cross-task feature learning.
Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.
arXiv Detail & Related papers (2023-07-27T11:28:33Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Self-Supervised Monocular Depth Estimation with Internal Feature Fusion [12.874712571149725]
Self-supervised learning for depth estimation uses geometry in image sequences for supervision.
We propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures.
arXiv Detail & Related papers (2021-10-18T17:31:11Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - The Edge of Depth: Explicit Constraints between Segmentation and Depth [25.232436455640716]
We study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images.
We propose to explicitly measure the border consistency between segmentation and depth and minimize it.
Through extensive experiments, our proposed approach advances the state of the art on unsupervised monocular depth estimation in the KITTI.
arXiv Detail & Related papers (2020-04-01T00:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.