Bending Reality: Distortion-aware Transformers for Adapting to Panoramic
Semantic Segmentation
- URL: http://arxiv.org/abs/2203.01452v1
- Date: Wed, 2 Mar 2022 23:00:32 GMT
- Title: Bending Reality: Distortion-aware Transformers for Adapting to Panoramic
Semantic Segmentation
- Authors: Jiaming Zhang, Kailun Yang, Chaoxiang Ma, Simon Rei{\ss}, Kunyu Peng,
Rainer Stiefelhagen
- Abstract summary: Large quantities of expensive, pixel-wise annotations are crucial for success of robust panoramic segmentation models.
Distortions and the distinct image-feature distribution in 360-degree panoramas impede the transfer from the annotation-rich pinhole domain.
We learn object deformations and panoramic image distortions in Deformable Patch Embedding (DPE) and Deformable Deformable (DMLP) components.
Finally, we tie together shared semantics in pinhole- and panoramic feature embeddings by generating multi-scale prototype features.
- Score: 26.09267582056609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoramic images with their 360-degree directional view encompass exhaustive
information about the surrounding space, providing a rich foundation for scene
understanding. To unfold this potential in the form of robust panoramic
segmentation models, large quantities of expensive, pixel-wise annotations are
crucial for success. Such annotations are available, but predominantly for
narrow-angle, pinhole-camera images which, off the shelf, serve as sub-optimal
resources for training panoramic models. Distortions and the distinct
image-feature distribution in 360-degree panoramas impede the transfer from the
annotation-rich pinhole domain and therefore come with a big dent in
performance. To get around this domain difference and bring together semantic
annotations from pinhole- and 360-degree surround-visuals, we propose to learn
object deformations and panoramic image distortions in the Deformable Patch
Embedding (DPE) and Deformable MLP (DMLP) components which blend into our
Transformer for PAnoramic Semantic Segmentation (Trans4PASS) model. Finally, we
tie together shared semantics in pinhole- and panoramic feature embeddings by
generating multi-scale prototype features and aligning them in our Mutual
Prototypical Adaptation (MPA) for unsupervised domain adaptation. On the indoor
Stanford2D3D dataset, our Trans4PASS with MPA maintains comparable performance
to fully-supervised state-of-the-arts, cutting the need for over 1,400 labeled
panoramas. On the outdoor DensePASS dataset, we break state-of-the-art by
14.39% mIoU and set the new bar at 56.38%. Code will be made publicly available
at https://github.com/jamycheung/Trans4PASS.
Related papers
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - Multi-source Domain Adaptation for Panoramic Semantic Segmentation [22.367890439050786]
We propose a new task of multi-source domain adaptation for panoramic semantic segmentation.
We aim to utilize both real pinhole synthetic panoramic images in the source domains, enabling the segmentation model to perform well on unlabeled real panoramic images.
DTA4PASS converts all pinhole images in the source domains into panoramic-like images, and then aligns the converted source domains with the target domain.
arXiv Detail & Related papers (2024-08-29T12:00:11Z) - Open Panoramic Segmentation [34.46596562350091]
We propose a new task called Open Panoramic (OPS), where models are trained with FoV-restricted pinhole images in an open-vocabulary setting.
We also propose a model named OOOPS with a Deformable Adapter Network (DAN), which significantly improves panoramic semantic segmentation performance.
Surpassing other state-of-the-art open-vocabulary semantic segmentation approaches, a remarkable performance boost on three panoramic datasets.
arXiv Detail & Related papers (2024-07-02T22:00:32Z) - SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic
Segmentation [53.5256153325136]
PAnoramic Semantic (PASS) gives complete scene perception based on an ultra-wide angle of view.
Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original $360circ$ data.
We propose Spherical Geometry-Aware Transformer for PAnoramic Semantic (SGAT4PASS) to be more robust to 3D disturbance.
arXiv Detail & Related papers (2023-06-06T04:49:51Z) - PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline
Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas.
Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion.
Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z) - Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation [73.48323921632506]
We address panoramic semantic segmentation which is under-explored due to two critical challenges.
First, we propose an upgraded Transformer for Panoramic Semantic, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable (DMLPv2) modules.
Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation.
Third, aside from Pinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images
arXiv Detail & Related papers (2022-07-25T00:42:38Z) - Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for
Mobile Agents via Unsupervised Contrastive Learning [93.6645991946674]
We introduce panoramic panoptic segmentation, as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to a mobile agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2022-06-21T20:07:15Z) - Transfer beyond the Field of View: Dense Panoramic Semantic Segmentation
via Unsupervised Domain Adaptation [30.104947024614127]
We formalize the task of unsupervised domain adaptation for panoramic semantic segmentation.
DensePASS is a novel dataset for panoramic segmentation under cross-domain conditions.
We introduce P2PDA - a generic framework for Pinhole-to-Panoramic semantic segmentation.
arXiv Detail & Related papers (2021-10-21T11:22:05Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.