PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation
- URL: http://arxiv.org/abs/2206.00468v1
- Date: Wed, 1 Jun 2022 13:00:49 GMT
- Title: PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation
- Authors: Naiyu Gao, Fei He, Jian Jia, Yanhu Shan, Haoyang Zhang, Xin Zhao,
Kaiqi Huang
- Abstract summary: We propose a unified framework for depth-aware panoptic segmentation (DPS)
We generate instance-specific kernels to predict depth and segmentation masks for each instance.
We add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss.
- Score: 41.85216306978024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a unified framework for depth-aware panoptic segmentation
(DPS), which aims to reconstruct 3D scene with instance-level semantics from
one single image. Prior works address this problem by simply adding a dense
depth regression head to panoptic segmentation (PS) networks, resulting in two
independent task branches. This neglects the mutually-beneficial relations
between these two tasks, thus failing to exploit handy instance-level semantic
cues to boost depth accuracy while also producing sub-optimal depth maps. To
overcome these limitations, we propose a unified framework for the DPS task by
applying a dynamic convolution technique to both the PS and depth prediction
tasks. Specifically, instead of predicting depth for all pixels at a time, we
generate instance-specific kernels to predict depth and segmentation masks for
each instance. Moreover, leveraging the instance-wise depth estimation scheme,
we add additional instance-level depth cues to assist with supervising the
depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS
and SemKITTI-DPS show the effectiveness and promise of our method. We hope our
unified solution to DPS can lead a new paradigm in this area. Code is available
at https://github.com/NaiyuGao/PanopticDepth.
Related papers
- GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a
Gradient-Aware Mask and Semantic Constraints [12.426365333096264]
We propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints.
The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions.
The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries.
arXiv Detail & Related papers (2024-02-22T07:53:34Z) - Towards Deeply Unified Depth-aware Panoptic Segmentation with
Bi-directional Guidance Learning [63.63516124646916]
We propose a deeply unified framework for depth-aware panoptic segmentation.
We propose a bi-directional guidance learning approach to facilitate cross-task feature learning.
Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.
arXiv Detail & Related papers (2023-07-27T11:28:33Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - SemSegDepth: A Combined Model for Semantic Segmentation and Depth
Completion [18.19171031755595]
We propose a new end-to-end model for performing semantic segmentation and depth completion jointly.
Our approach relies on RGB and sparse depth as inputs to our model and produces a dense depth map and the corresponding semantic segmentation image.
Experiments done on Virtual KITTI 2 dataset, demonstrate and provide further evidence, that combining both tasks, semantic segmentation and depth completion, in a multi-task network can effectively improve the performance of each task.
arXiv Detail & Related papers (2022-09-01T11:52:11Z) - PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic
Segmentation [90.26723865198348]
We present PolyphonicFormer, a vision transformer to unify all the sub-tasks under the DVPS task.
Our method explores the relationship between depth estimation and panoptic segmentation via query-based learning.
Our method ranks 1st on the ICCV-2021 BMTT Challenge video + depth track.
arXiv Detail & Related papers (2021-12-05T14:31:47Z) - Domain Adaptive Semantic Segmentation with Self-Supervised Depth
Estimation [84.34227665232281]
Domain adaptation for semantic segmentation aims to improve the model performance in the presence of a distribution shift between source and target domain.
We leverage the guidance from self-supervised depth estimation, which is available on both domains, to bridge the domain gap.
We demonstrate the effectiveness of our proposed approach on the benchmark tasks SYNTHIA-to-Cityscapes and GTA-to-Cityscapes.
arXiv Detail & Related papers (2021-04-28T07:47:36Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic
Segmentation [31.078913193966585]
We present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision.
ViP-DeepLab approaches it by jointly performing monocular depth estimation and video panoptic segmentation.
On the individual sub-tasks, ViP-DeepLab achieves state-of-the-art results, outperforming previous methods by 5.1% VPQ on Cityscapes-VPS, ranking 1st on the KITTI monocular depth estimation benchmark, and 1st on KITTI MOTS pedestrian.
arXiv Detail & Related papers (2020-12-09T19:00:35Z) - Monocular 3D Object Detection with Sequential Feature Association and
Depth Hint Augmentation [12.55603878441083]
FADNet is presented to address the task of monocular 3D object detection.
A dedicated depth hint module is designed to generate row-wise features named as depth hints.
The contributions of this work are validated by conducting experiments and ablation study on the KITTI benchmark.
arXiv Detail & Related papers (2020-11-30T07:19:14Z) - Guiding Monocular Depth Estimation Using Depth-Attention Volume [38.92495189498365]
We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
arXiv Detail & Related papers (2020-04-06T15:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.