Related papers: PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment

PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment

URL: http://arxiv.org/abs/2508.04659v1
Date: Wed, 06 Aug 2025 17:27:50 GMT
Title: PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment
Authors: Gustav Hanning, Kalle Åström, Viktor Larsson,
Abstract summary: We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation.<n>By training with the optimization end-to-end, we learn feature maps that yield large convergence basins and smooth loss landscapes.<n>In thorough experiments we validate our approach and significantly outperform the competition.
Score: 26.610824644310846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Coarse room layout estimation provides important geometric cues for many downstream tasks. Current state-of-the-art methods are predominantly based on single views and often assume panoramic images. We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation, which is based on multi-view alignment of dense deep features. By training with the optimization end-to-end, we learn feature maps that yield large convergence basins and smooth loss landscapes in the alignment. This allows us to initialize the room layout using simple heuristics. For the evaluation we propose two new benchmarks based on ScanNet++ and 2D-3D-Semantics, with manually verified ground truth 3D cuboids. In thorough experiments we validate our approach and significantly outperform the competition. Finally, while our network is trained with single cuboids, the flexibility of the optimization-based approach allow us to easily extend to multi-room estimation, e.g. larger apartments or offices. Code and model weights are available at https://github.com/ghanning/PixCuboid.

Related papers

PanSt3R: Multi-view Consistent Panoptic Segmentation [10.781185925397493]
We argue that relying on 2D panoptic segmentation for a problem inherently 3D and multi-view is likely suboptimal.<n>We propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization.<n>PanSt3R is conceptually simple, yet fast and scalable, and achieves state-of-the-art performance on several benchmarks.
arXiv Detail & Related papers (2025-06-26T15:02:00Z)
LinPrim: Linear Primitives for Differentiable Volumetric Rendering [53.780682194322225]
We introduce two new scene representations based on linear primitives.<n>We present a different octaiableizer that runs efficiently on GPU.<n>We demonstrate comparable performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-27T18:49:38Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
MCTS with Refinement for Proposals Selection Games in Scene Understanding [32.92475660892122]
We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm. From a generated pool of proposals, our method jointly selects and optimize proposals that maximize the objective term. Our method shows high performance on the Matterport3D dataset without introducing hard constraints on room layout configurations.
arXiv Detail & Related papers (2022-07-07T10:15:54Z)
Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments. TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z)
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos [0.5276232626689566]
We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features. A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map. In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-07T10:53:27Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer. To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
Joint Multi-Dimension Pruning via Numerical Gradient Update [120.59697866489668]
We present joint multi-dimension pruning (abbreviated as JointPruning), an effective method of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. We show that our method is optimized collaboratively across the three dimensions in a single end-to-end training and it is more efficient than the previous exhaustive methods.
arXiv Detail & Related papers (2020-05-18T17:57:09Z)
General 3D Room Layout from a Single View by Render-and-Compare [36.94817376590415]
We present a novel method to reconstruct the 3D layout of a room from a single perspective view. Our dataset consists of 293 images from ScanNet, which we annotated with precise 3D layouts.
arXiv Detail & Related papers (2020-01-07T16:14:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.