PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image
- URL: http://arxiv.org/abs/2110.11219v1
- Date: Thu, 21 Oct 2021 15:54:03 GMT
- Title: PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image
- Authors: Yaxu Xie, Fangwen Shu, Jason Rambach, Alain Pagani, Didier Stricker
- Abstract summary: Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
- Score: 11.215334675788952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Piece-wise 3D planar reconstruction provides holistic scene understanding of
man-made environments, especially for indoor scenarios. Most recent approaches
focused on improving the segmentation and reconstruction results by introducing
advanced network architectures but overlooked the dual characteristics of
piece-wise planes as objects and geometric models. Different from other
existing approaches, we start from enforcing cross-task consistency for our
multi-task convolutional neural network, PlaneRecNet, which integrates a
single-stage instance segmentation network for piece-wise planar segmentation
and a depth decoder to reconstruct the scene from a single RGB image. To
achieve this, we introduce several novel loss functions (geometric constraint)
that jointly improve the accuracy of piece-wise planar segmentation and depth
estimation. Meanwhile, a novel Plane Prior Attention module is used to guide
depth estimation with the awareness of plane instances. Exhaustive experiments
are conducted in this work to validate the effectiveness and efficiency of our
method.
Related papers
- Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation [38.81275292687583]
We propose Plane2Depth, which adaptively utilizes plane information to improve depth prediction within a hierarchical framework.
In the proposed plane guided depth generator (PGDG), we design a set of plane queries as prototypes to softly model planes in the scene and predict per-pixel plane coefficients.
In the proposed adaptive plane query aggregation (APGA) module, we introduce a novel feature interaction approach to improve the aggregation of multi-scale plane features.
arXiv Detail & Related papers (2024-09-04T07:45:06Z) - UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos [12.328095228008893]
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos.
We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment.
Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
arXiv Detail & Related papers (2024-07-04T03:02:27Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Multi-task Planar Reconstruction with Feature Warping Guidance [3.95944314850151]
Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image.
We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture.
Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.
arXiv Detail & Related papers (2023-11-25T09:53:42Z) - X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth
Estimation with Cross-Task Distillation and Boundary Correction [9.215384107659665]
X-PDNet is a framework for the multitask learning of plane instance segmentation and depth estimation.
We highlight the current limitations of using the ground truth boundary to develop boundary regression loss.
We propose a novel method that exploits depth information to support precise boundary region segmentation.
arXiv Detail & Related papers (2023-09-15T14:27:54Z) - PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation [10.982464344805194]
PlaneRecTR++ is a Transformer-based architecture that unifies all sub-tasks related to multi-view reconstruction and pose estimation.
Our proposed unified learning achieves mutual benefits across sub-tasks, obtaining a new state-of-the-art performance on public ScanNetv1, ScanNetv2, NYUv2-Plane, and MatterPort3D datasets.
arXiv Detail & Related papers (2023-07-25T18:28:19Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.