PanoSwin: a Pano-style Swin Transformer for Panorama Understanding
- URL: http://arxiv.org/abs/2308.14726v1
- Date: Mon, 28 Aug 2023 17:30:14 GMT
- Title: PanoSwin: a Pano-style Swin Transformer for Panorama Understanding
- Authors: Zhixin Ling, Zhen Xing, Xiangdong Zhou, Manliang Cao, Guichun Zhou
- Abstract summary: equirectangular projection (ERP) entails boundary discontinuity and spatial distortion.
We propose PanoSwin to learn panorama representations with ERP.
We conduct experiments against the state-of-the-art on various panoramic tasks.
- Score: 15.115868803355081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In panorama understanding, the widely used equirectangular projection (ERP)
entails boundary discontinuity and spatial distortion. It severely deteriorates
the conventional CNNs and vision Transformers on panoramas. In this paper, we
propose a simple yet effective architecture named PanoSwin to learn panorama
representations with ERP. To deal with the challenges brought by
equirectangular projection, we explore a pano-style shift windowing scheme and
novel pitch attention to address the boundary discontinuity and the spatial
distortion, respectively. Besides, based on spherical distance and Cartesian
coordinates, we adapt absolute positional embeddings and relative positional
biases for panoramas to enhance panoramic geometry information. Realizing that
planar image understanding might share some common knowledge with panorama
understanding, we devise a novel two-stage learning framework to facilitate
knowledge transfer from the planar images to panoramas. We conduct experiments
against the state-of-the-art on various panoramic tasks, i.e., panoramic object
detection, panoramic classification, and panoramic layout estimation. The
experimental results demonstrate the effectiveness of PanoSwin in panorama
understanding.
Related papers
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - VidPanos: Generative Panoramic Videos from Casual Panning Videos [73.77443496436749]
Panoramic image stitching provides a unified, wide-angle view of a scene that extends beyond the camera's field of view.
We present a method for synthesizing a panoramic video from a casually-captured panning video.
Our system can create video panoramas for a range of in-the-wild scenes including people, vehicles, and flowing water.
arXiv Detail & Related papers (2024-10-17T17:53:24Z) - Mixed-View Panorama Synthesis using Geospatially Guided Diffusion [15.12293324464805]
We introduce the task of mixed-view panorama synthesis.
The goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area.
arXiv Detail & Related papers (2024-07-12T20:12:07Z) - PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline
Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas.
Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion.
Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z) - Panoramic Image-to-Image Translation [37.9486466936501]
We tackle the challenging task of Panoramic Image-to-Image translation (Pano-I2I) for the first time.
This task is difficult due to the geometric distortion of panoramic images and the lack of a panoramic image dataset with diverse conditions, like weather or time.
We propose a panoramic distortion-aware I2I model that preserves the structure of the panoramic images while consistently translating their global style referenced from a pinhole image.
arXiv Detail & Related papers (2023-04-11T04:08:58Z) - PanoViT: Vision Transformer for Room Layout Estimation from a Single
Panoramic Image [11.053777620735175]
PanoViT is a panorama vision transformer to estimate the room layout from a single panoramic image.
Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image.
Our method outperforms state-of-the-art solutions in room layout prediction accuracy.
arXiv Detail & Related papers (2022-12-23T05:37:11Z) - Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation [73.48323921632506]
We address panoramic semantic segmentation which is under-explored due to two critical challenges.
First, we propose an upgraded Transformer for Panoramic Semantic, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable (DMLPv2) modules.
Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation.
Third, aside from Pinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images
arXiv Detail & Related papers (2022-07-25T00:42:38Z) - Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for
Mobile Agents via Unsupervised Contrastive Learning [93.6645991946674]
We introduce panoramic panoptic segmentation, as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to a mobile agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2022-06-21T20:07:15Z) - PanoFormer: Panorama Transformer for Indoor 360{\deg} Depth Estimation [35.698249161263966]
Existing panoramic depth estimation methods based on convolutional neural networks (CNNs) focus on removing panoramic distortions.
This paper proposes the panorama transformer (named PanoFormer) to estimate the depth in panorama images.
In particular, we divide patches on the spherical tangent domain into tokens to reduce the negative effect of panoramic distortions.
arXiv Detail & Related papers (2022-03-17T12:19:43Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.