360 Layout Estimation via Orthogonal Planes Disentanglement and
Multi-view Geometric Consistency Perception
- URL: http://arxiv.org/abs/2312.16268v1
- Date: Tue, 26 Dec 2023 12:16:03 GMT
- Title: 360 Layout Estimation via Orthogonal Planes Disentanglement and
Multi-view Geometric Consistency Perception
- Authors: Zhijie Shen, Chunyu Lin, Junsong Zhang, Lang Nie, Kang Liao, Yao Zhao
- Abstract summary: Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence.
We present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
We also introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation.
- Score: 60.23832277827669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing panoramic layout estimation solutions tend to recover room
boundaries from a vertically compressed sequence, yielding imprecise results as
the compression process often muddles the semantics between various planes.
Besides, these data-driven approaches impose an urgent demand for massive data
annotations, which are laborious and time-consuming. For the first problem, we
propose an orthogonal plane disentanglement network (termed DOPNet) to
distinguish ambiguous semantics. DOPNet consists of three modules that are
integrated to deliver distortion-free, semantics-clean, and detail-sharp
disentangled representations, which benefit the subsequent layout recovery. For
the second problem, we present an unsupervised adaptation technique tailored
for horizon-depth and ratio representations. Concretely, we introduce an
optimization strategy for decision-level layout analysis and a 1D cost volume
construction method for feature-level multi-view aggregation, both of which are
designed to fully exploit the geometric consistency across multiple
perspectives. The optimizer provides a reliable set of pseudo-labels for
network training, while the 1D cost volume enriches each view with
comprehensive scene information derived from other perspectives. Extensive
experiments demonstrate that our solution outperforms other SoTA models on both
monocular layout estimation and multi-view layout estimation tasks.
Related papers
- SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth
Estimation with Cross-Task Distillation and Boundary Correction [9.215384107659665]
X-PDNet is a framework for the multitask learning of plane instance segmentation and depth estimation.
We highlight the current limitations of using the ground truth boundary to develop boundary regression loss.
We propose a novel method that exploits depth information to support precise boundary region segmentation.
arXiv Detail & Related papers (2023-09-15T14:27:54Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z) - Self-supervised Depth Estimation Leveraging Global Perception and
Geometric Smoothness Using On-board Videos [0.5276232626689566]
We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features.
A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map.
In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-07T10:53:27Z) - Weakly But Deeply Supervised Occlusion-Reasoned Parametric Layouts [87.370534321618]
We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space.
The only human annotations required by our method are for parametric attributes that are cheaper and less ambiguous to obtain.
We validate our approach on two public datasets, KITTI and NuScenes, to achieve state-of-the-art results with considerably lower human supervision.
arXiv Detail & Related papers (2021-04-14T09:32:29Z) - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth
Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama.
We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable.
Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.