PanoViT: Vision Transformer for Room Layout Estimation from a Single
Panoramic Image
- URL: http://arxiv.org/abs/2212.12156v1
- Date: Fri, 23 Dec 2022 05:37:11 GMT
- Title: PanoViT: Vision Transformer for Room Layout Estimation from a Single
Panoramic Image
- Authors: Weichao Shen, Yuan Dong, Zonghao Chen, Zhengyi Zhao, Yang Gao, and Zhu
Liu
- Abstract summary: PanoViT is a panorama vision transformer to estimate the room layout from a single panoramic image.
Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image.
Our method outperforms state-of-the-art solutions in room layout prediction accuracy.
- Score: 11.053777620735175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose PanoViT, a panorama vision transformer to estimate
the room layout from a single panoramic image. Compared to CNN models, our
PanoViT is more proficient in learning global information from the panoramic
image for the estimation of complex room layouts. Considering the difference
between a perspective image and an equirectangular image, we design a novel
recurrent position embedding and a patch sampling method for the processing of
panoramic images. In addition to extracting global information, PanoViT also
includes a frequency-domain edge enhancement module and a 3D loss to extract
local geometric features in a panoramic image. Experimental results on several
datasets demonstrate that our method outperforms state-of-the-art solutions in
room layout prediction accuracy.
Related papers
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene
Understanding [14.489840196199882]
PanoMixSwap is a novel data augmentation technique specifically designed for indoor panoramic images.
We decompose each panoramic image into its constituent parts: background style, foreground furniture, and room layout.
We generate an augmented image by mixing these three parts from three different images, such as the foreground furniture from one image, the background style from another image, and the room structure from the third image.
arXiv Detail & Related papers (2023-09-18T06:52:13Z) - PanoSwin: a Pano-style Swin Transformer for Panorama Understanding [15.115868803355081]
equirectangular projection (ERP) entails boundary discontinuity and spatial distortion.
We propose PanoSwin to learn panorama representations with ERP.
We conduct experiments against the state-of-the-art on various panoramic tasks.
arXiv Detail & Related papers (2023-08-28T17:30:14Z) - PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline
Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas.
Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion.
Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z) - PanoGen: Text-Conditioned Panoramic Environment Generation for
Vision-and-Language Navigation [96.8435716885159]
Vision-and-Language Navigation (VLN) requires the agent to follow language instructions to navigate through 3D environments.
One main challenge in VLN is the limited availability of training environments, which makes it hard to generalize to new and unseen environments.
We propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text.
arXiv Detail & Related papers (2023-05-30T16:39:54Z) - PanoContext-Former: Panoramic Total Scene Understanding with a
Transformer [37.51637352106841]
Panoramic image enables deeper understanding and more holistic perception of $360circ$ surrounding environment.
In this paper, we propose a novel method using depth prior for holistic indoor scene understanding.
In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes.
arXiv Detail & Related papers (2023-05-21T16:20:57Z) - Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for
Mobile Agents via Unsupervised Contrastive Learning [93.6645991946674]
We introduce panoramic panoptic segmentation, as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to a mobile agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2022-06-21T20:07:15Z) - GLPanoDepth: Global-to-Local Panoramic Depth Estimation [18.06592473599777]
We propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image.
We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals.
This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving state-of-the-art performance in panoramic depth estimation.
arXiv Detail & Related papers (2022-02-06T15:11:58Z) - DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene
Context Graph and Relation-based Optimization [66.25948693095604]
We propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image.
Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement.
arXiv Detail & Related papers (2021-08-24T13:55:29Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.