PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation
- URL: http://arxiv.org/abs/2510.11992v1
- Date: Mon, 13 Oct 2025 22:40:49 GMT
- Title: PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation
- Authors: Hatem Ibrahem, Ahmed Salem, Qinmin Vivian Hu, Guanghui Wang,
- Abstract summary: Accurately estimating the 3D layout of rooms is a crucial task in computer vision.<n>This paper proposes a novel model, PanoTPS-Net, to estimate room layout from a single panorama image.
- Score: 9.400960986963328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurately estimating the 3D layout of rooms is a crucial task in computer vision, with potential applications in robotics, augmented reality, and interior design. This paper proposes a novel model, PanoTPS-Net, to estimate room layout from a single panorama image. Leveraging a Convolutional Neural Network (CNN) and incorporating a Thin Plate Spline (TPS) spatial transformation, the architecture of PanoTPS-Net is divided into two stages: First, a convolutional neural network extracts the high-level features from the input images, allowing the network to learn the spatial parameters of the TPS transformation. Second, the TPS spatial transformation layer is generated to warp a reference layout to the required layout based on the predicted parameters. This unique combination empowers the model to properly predict room layouts while also generalizing effectively to both cuboid and non-cuboid layouts. Extensive experiments on publicly available datasets and comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed method. The results underscore the model's accuracy in room layout estimation and emphasize the compatibility between the TPS transformation and panorama images. The robustness of the model in handling both cuboid and non-cuboid room layout estimation is evident with a 3DIoU value of 85.49, 86.16, 81.76, and 91.98 on PanoContext, Stanford-2D3D, Matterport3DLayout, and ZInD datasets, respectively. The source code is available at: https://github.com/HatemHosam/PanoTPS_Net.
Related papers
- PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment [26.610824644310846]
We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation.<n>By training with the optimization end-to-end, we learn feature maps that yield large convergence basins and smooth loss landscapes.<n>In thorough experiments we validate our approach and significantly outperform the competition.
arXiv Detail & Related papers (2025-08-06T17:27:50Z) - Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model [15.892685514932323]
We introduce Plane-DUSt3R, a novel method for multi-view room layout estimation.<n>Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes.<n>By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results.
arXiv Detail & Related papers (2025-02-24T02:14:19Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - iBARLE: imBalance-Aware Room Layout Estimation [54.819085005591894]
Room layout estimation predicts layouts from a single panorama.
There are significant imbalances in real-world datasets including the dimensions of layout complexity, camera locations, and variation in scene appearance.
We propose imBalance-Aware Room Layout Estimation (iBARLE) framework to address these issues.
iBARLE consists of (1) Appearance Variation Generation (AVG) module, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w.r.t. room structure, and (3) a gradient-based layout objective function.
arXiv Detail & Related papers (2023-08-29T06:20:36Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama
Registration Network [44.06968418800436]
We present a complete panoramic layout estimation framework that jointly learns panorama registration and layout estimation given a pair of panoramas.
The major improvement over PSMNet comes from a novel Geometry-aware Panorama Registration Network or GPR-Net.
Experimental results indicate that our method achieves state-of-the-art performance in both panorama registration and layout estimation on a large-scale indoor panorama dataset ZInD.
arXiv Detail & Related papers (2022-10-20T17:10:41Z) - 3D Room Layout Estimation from a Cubemap of Panorama Image via Deep
Manhattan Hough Transform [17.51123287432334]
We present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block.
We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to geometric output.
The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure.
arXiv Detail & Related papers (2022-07-19T14:22:28Z) - LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware
Transformer Network [1.3512949730789903]
We propose an efficient network, LGT-Net, for room layout estimation.
Experiments show that the proposed LGT-Net achieves better performance than current state-of-the-arts (SOTA) on benchmark datasets.
arXiv Detail & Related papers (2022-03-03T16:28:10Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth
Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama.
We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable.
Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.