Related papers: LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network

LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network

URL: http://arxiv.org/abs/2203.01824v1
Date: Thu, 3 Mar 2022 16:28:10 GMT
Title: LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
Authors: Zhigang Jiang, Zhongzheng Xiang, Jinhua Xu, Ming Zhao
Abstract summary: We propose an efficient network, LGT-Net, for room layout estimation. Experiments show that the proposed LGT-Net achieves better performance than current state-of-the-arts (SOTA) on benchmark datasets.
Score: 1.3512949730789903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D room layout estimation by a single panorama using deep neural networks has made great progress. However, previous approaches can not obtain efficient geometry awareness of room layout with the only latitude of boundaries or horizon-depth. We present that using horizon-depth along with room height can obtain omnidirectional-geometry awareness of room layout in both horizontal and vertical directions. In addition, we propose a planar-geometry aware loss function with normals and gradients of normals to supervise the planeness of walls and turning of corners. We propose an efficient network, LGT-Net, for room layout estimation, which contains a novel Transformer architecture called SWG Transformer to model geometry relations. SWG Transformer consists of (Shifted) Window Blocks and Global Blocks to combine the local and global geometry relations. Moreover, we design a novel relative position embedding of Transformer to enhance the spatial identification ability for the panorama. Experiments show that the proposed LGT-Net achieves better performance than current state-of-the-arts (SOTA) on benchmark datasets.

Related papers

You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting [57.44295803750027]
We present a novel framework, named TPGS, to bridge continuous panoramic 3D scene reconstruction with perspective Gaussian splatting. Specifically, we optimize 3D Gaussians within individual cube faces and then fine-tune them in the stitched panoramic space. Experiments on indoor and outdoor, egocentric, and roaming benchmark datasets demonstrate that our approach outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-04-12T03:42:50Z)
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation [28.299293407423455]
GALA is a novel representation of 3D shapes that excels at capturing and reproducing complex geometry and surface details. With our optimized C++/CUDA implementation, GALA can be fitted to an object in less than 10 seconds. We provide a cascaded generation pipeline capable of generating 3D shapes with great geometric detail.
arXiv Detail & Related papers (2024-10-13T22:53:58Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
SGFormer: Spherical Geometry Transformer for 360 Depth Estimation [54.13459226728249]
Panoramic distortion poses a significant challenge in 360 depth estimation. We propose a spherical geometry transformer, named SGFormer, to address the above issues. We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
arXiv Detail & Related papers (2024-04-23T12:36:24Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement [16.103152098205566]
We aim to synthesize floorplans as sequences of 1-D vectors, which eases user interaction and design customization. In the first stage, we encode the room connectivity graph input by users with a graphal network (GCN), then apply an autoregressive transformer network to generate an initial floorplan sequence. To polish the initial design and generate more visually appealing floorplans, we further propose a novel panoptic refinement network(PRN) composed of a GCN and a transformer network.
arXiv Detail & Related papers (2022-07-27T03:19:20Z)
3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform [17.51123287432334]
We present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to geometric output. The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure.
arXiv Detail & Related papers (2022-07-19T14:22:28Z)
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z)
LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama. We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable. Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z)
GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes [18.900646770506256]
We propose to incorporate geometric reasoning to deep learning for layout estimation. Our approach learns to infer the depth maps of the dominant planes in the scene by predicting the pixel-level surface parameters. We present a new dataset with pixel-level depth annotation of dominant planes.
arXiv Detail & Related papers (2020-08-14T10:34:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.