X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth
Estimation with Cross-Task Distillation and Boundary Correction
- URL: http://arxiv.org/abs/2309.08424v2
- Date: Sun, 24 Sep 2023 12:12:29 GMT
- Title: X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth
Estimation with Cross-Task Distillation and Boundary Correction
- Authors: Cao Dinh Duc, Jongwoo Lim
- Abstract summary: X-PDNet is a framework for the multitask learning of plane instance segmentation and depth estimation.
We highlight the current limitations of using the ground truth boundary to develop boundary regression loss.
We propose a novel method that exploits depth information to support precise boundary region segmentation.
- Score: 9.215384107659665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmentation of planar regions from a single RGB image is a particularly
important task in the perception of complex scenes. To utilize both visual and
geometric properties in images, recent approaches often formulate the problem
as a joint estimation of planar instances and dense depth through feature
fusion mechanisms and geometric constraint losses. Despite promising results,
these methods do not consider cross-task feature distillation and perform
poorly in boundary regions. To overcome these limitations, we propose X-PDNet,
a framework for the multitask learning of plane instance segmentation and depth
estimation with improvements in the following two aspects. Firstly, we
construct the cross-task distillation design which promotes early information
sharing between dual-tasks for specific task improvements. Secondly, we
highlight the current limitations of using the ground truth boundary to develop
boundary regression loss, and propose a novel method that exploits depth
information to support precise boundary region segmentation. Finally, we
manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset
and make available for evaluation of plane instance segmentation. Through the
experiments, our proposed methods prove the advantages, outperforming the
baseline with large improvement margins in the quantitative results on the
ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of
our proposals.
Related papers
- 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z) - InverseForm: A Loss Function for Structured Boundary-Aware Segmentation [80.39674800972182]
We present a novel boundary-aware loss term for semantic segmentation using an inverse-transformation network.
This plug-in loss term complements the cross-entropy loss in capturing boundary transformations.
We analyze the quantitative and qualitative effects of our loss function on three indoor and outdoor segmentation benchmarks.
arXiv Detail & Related papers (2021-04-06T18:52:45Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Improving Monocular Depth Estimation by Leveraging Structural Awareness
and Complementary Datasets [21.703238902823937]
We propose a structure-aware neural network with spatial attention blocks to exploit the spatial relationship of visual features.
Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction.
Third, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes.
arXiv Detail & Related papers (2020-07-22T08:21:02Z) - Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental
Study [2.6205925938720833]
State of the art methods use deep neural networks to predict semantic classes for each point in a LiDAR scan.
A powerful and efficient way to process LiDAR measurements is to use two-dimensional, image-like projections.
We demonstrate various techniques to boost the performance and to improve runtime as well as memory constraints.
arXiv Detail & Related papers (2020-04-06T11:08:12Z) - Refined Plane Segmentation for Cuboid-Shaped Objects by Leveraging Edge
Detection [63.942632088208505]
We propose a post-processing algorithm to align the segmented plane masks with edges detected in the image.
This allows us to increase the accuracy of state-of-the-art approaches, while limiting ourselves to cuboid-shaped objects.
arXiv Detail & Related papers (2020-03-28T18:51:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.