Related papers: An End-to-End Room Geometry Constrained Depth Estimation Framework for Indoor Panorama Images

An End-to-End Room Geometry Constrained Depth Estimation Framework for Indoor Panorama Images

URL: http://arxiv.org/abs/2510.07817v1
Date: Thu, 09 Oct 2025 05:52:48 GMT
Title: An End-to-End Room Geometry Constrained Depth Estimation Framework for Indoor Panorama Images
Authors: Kanglin Ning, Ruzhao Chen, Penghong Wang, Xingtao Wang, Ruiqin Xiong, Xiaopeng Fan,
Abstract summary: Existing methods focus on pixel-level accuracy, causing oversmoothed room corners and noise sensitivity.<n>We propose a depth estimation framework based on room geometry constraints.<n>Our framework incorporates two strategies: a room geometry-based background depth resolving strategy and a background-segmentation-guided fusion mechanism.
Score: 50.84536164535991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting spherical pixel depth from monocular $360^{\circ}$ indoor panoramas is critical for many vision applications. However, existing methods focus on pixel-level accuracy, causing oversmoothed room corners and noise sensitivity. In this paper, we propose a depth estimation framework based on room geometry constraints, which extracts room geometry information through layout prediction and integrates those information into the depth estimation process through background segmentation mechanism. At the model level, our framework comprises a shared feature encoder followed by task-specific decoders for layout estimation, depth estimation, and background segmentation. The shared encoder extracts multi-scale features, which are subsequently processed by individual decoders to generate initial predictions: a depth map, a room layout map, and a background segmentation map. Furthermore, our framework incorporates two strategies: a room geometry-based background depth resolving strategy and a background-segmentation-guided fusion mechanism. The proposed room-geometry-based background depth resolving strategy leverages the room layout and the depth decoder's output to generate the corresponding background depth map. Then, a background-segmentation-guided fusion strategy derives fusion weights for the background and coarse depth maps from the segmentation decoder's predictions. Extensive experimental results on the Stanford2D3D, Matterport3D and Structured3D datasets show that our proposed methods can achieve significantly superior performance than current open-source methods. Our code is available at https://github.com/emiyaning/RGCNet.

Related papers

FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers [91.59069344768858]
We introduce Frequency-aware Positional Depth Embedding (FreqPDE) to equip 2D image features with spatial information for 3D detection transformer decoder.<n>FreqPDE combines the 2D image features and 3D position embeddings to generate 3D depth-aware features for query decoding.
arXiv Detail & Related papers (2025-10-17T07:36:54Z)
Detail-aware multi-view stereo network for depth estimation [4.8203572077041335]
We propose a detail-aware multi-view stereo network (DA-MVSNet) with a coarse-to-fine framework.<n>The geometric depth clues hidden in the coarse stage are utilized to maintain the geometric structural relationships.<n>Experiments on the DTU and Tanks & Temples datasets demonstrate that our method achieves competitive results.
arXiv Detail & Related papers (2025-03-31T03:23:39Z)
Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [6.372979654151044]
Current state-of-the-art monocular depth estimators, trained on extensive datasets, generalize well but lack 3D consistency needed for many applications.<n>In this paper, we combine the strength of those generalizing monocular depth estimation techniques with multi-view data by framing this as an analysis-by-synthesis optimization problem.<n>Our method is able to generate detailed, high-quality, view consistent, accurate depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art multi-view depth reconstruction approaches on such datasets.
arXiv Detail & Related papers (2024-10-04T18:50:28Z)
Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error. We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface. Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z)
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior [133.76192155312182]
We propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation.
arXiv Detail & Related papers (2022-04-05T10:03:52Z)
VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results. Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume. In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z)
LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama. We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable. Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z)
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z)
GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes [18.900646770506256]
We propose to incorporate geometric reasoning to deep learning for layout estimation. Our approach learns to infer the depth maps of the dominant planes in the scene by predicting the pixel-level surface parameters. We present a new dataset with pixel-level depth annotation of dominant planes.
arXiv Detail & Related papers (2020-08-14T10:34:24Z)
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video. Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.