Related papers: 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception

360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception

URL: http://arxiv.org/abs/2312.16268v1
Date: Tue, 26 Dec 2023 12:16:03 GMT
Title: 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception
Authors: Zhijie Shen, Chunyu Lin, Junsong Zhang, Lang Nie, Kang Liao, Yao Zhao
Abstract summary: Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence. We present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. We also introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation.
Score: 60.23832277827669
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results as the compression process often muddles the semantics between various planes. Besides, these data-driven approaches impose an urgent demand for massive data annotations, which are laborious and time-consuming. For the first problem, we propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. DOPNet consists of three modules that are integrated to deliver distortion-free, semantics-clean, and detail-sharp disentangled representations, which benefit the subsequent layout recovery. For the second problem, we present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Concretely, we introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation, both of which are designed to fully exploit the geometric consistency across multiple perspectives. The optimizer provides a reliable set of pseudo-labels for network training, while the 1D cost volume enriches each view with comprehensive scene information derived from other perspectives. Extensive experiments demonstrate that our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.

Related papers

PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment [26.610824644310846]
We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation.<n>By training with the optimization end-to-end, we learn feature maps that yield large convergence basins and smooth loss landscapes.<n>In thorough experiments we validate our approach and significantly outperform the competition.
arXiv Detail & Related papers (2025-08-06T17:27:50Z)
Double-Shot 3D Shape Measurement with a Dual-Branch Network for Structured Light Projection Profilometry [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. Our method can reduce fringe order ambiguity while producing high-accuracy results on self-made datasets.
arXiv Detail & Related papers (2024-07-19T10:49:26Z)
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes. We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z)
X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth Estimation with Cross-Task Distillation and Boundary Correction [9.215384107659665]
X-PDNet is a framework for the multitask learning of plane instance segmentation and depth estimation. We highlight the current limitations of using the ground truth boundary to develop boundary regression loss. We propose a novel method that exploits depth information to support precise boundary region segmentation.
arXiv Detail & Related papers (2023-09-15T14:27:54Z)
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet. We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation. We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z)
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos [0.5276232626689566]
We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features. A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map. In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-07T10:53:27Z)
Weakly But Deeply Supervised Occlusion-Reasoned Parametric Layouts [87.370534321618]
We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space. The only human annotations required by our method are for parametric attributes that are cheaper and less ambiguous to obtain. We validate our approach on two public datasets, KITTI and NuScenes, to achieve state-of-the-art results with considerably lower human supervision.
arXiv Detail & Related papers (2021-04-14T09:32:29Z)
LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama. We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable. Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z)
Joint Multi-Dimension Pruning via Numerical Gradient Update [120.59697866489668]
We present joint multi-dimension pruning (abbreviated as JointPruning), an effective method of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. We show that our method is optimized collaboratively across the three dimensions in a single end-to-end training and it is more efficient than the previous exhaustive methods.
arXiv Detail & Related papers (2020-05-18T17:57:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.