Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
- URL: http://arxiv.org/abs/2207.11860v5
- Date: Fri, 31 May 2024 16:04:07 GMT
- Title: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
- Authors: Jiaming Zhang, Kailun Yang, Hao Shi, Simon Reiß, Kunyu Peng, Chaoxiang Ma, Haodong Fu, Philip H. S. Torr, Kaiwei Wang, Rainer Stiefelhagen,
- Abstract summary: We address panoramic semantic segmentation which is under-explored due to two critical challenges.
First, we propose an upgraded Transformer for Panoramic Semantic, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable (DMLPv2) modules.
Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation.
Third, aside from Pinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images
- Score: 73.48323921632506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360{\deg} imagery. To tackle these problems, first, we propose the upgraded Transformer for Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules for handling object deformations and image distortions whenever (before or after adaptation) and wherever (shallow or deep levels). Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation. Third, aside from Pinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images, facilitating Synthetic-to-Real (Syn2Real) adaptation scheme in 360{\deg} imagery. Extensive experiments are conducted, which cover indoor and outdoor scenarios, and each of them is investigated with Pin2Pan and Syn2Real regimens. Trans4PASS+ achieves state-of-the-art performances on four domain adaptive panoramic semantic segmentation benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS.
Related papers
- Multi-source Domain Adaptation for Panoramic Semantic Segmentation [22.367890439050786]
We propose a new task of multi-source domain adaptation for panoramic semantic segmentation.
We aim to utilize both real pinhole synthetic panoramic images in the source domains, enabling the segmentation model to perform well on unlabeled real panoramic images.
DTA4PASS converts all pinhole images in the source domains into panoramic-like images, and then aligns the converted source domains with the target domain.
arXiv Detail & Related papers (2024-08-29T12:00:11Z) - Dual-path Adaptation from Image to Video Transformers [62.056751480114784]
We efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.
We propose a novel DualPath adaptation separated into spatial and temporal adaptation paths, where a lightweight bottleneck adapter is employed in each transformer block.
arXiv Detail & Related papers (2023-03-17T09:37:07Z) - DeViT: Deformed Vision Transformers in Video Inpainting [59.73019717323264]
We extend previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH)
Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching.
Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens.
arXiv Detail & Related papers (2022-09-28T08:57:14Z) - UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer
via Hierarchical Mask Calibration [49.16591283724376]
We design UniDAformer, a unified domain adaptive panoptic segmentation transformer that is simple but can achieve domain adaptive instance segmentation and semantic segmentation simultaneously within a single network.
UniDAformer introduces Hierarchical Mask (HMC) that rectifies inaccurate predictions at the level of regions, superpixels and annotated pixels via online self-training on the fly.
It has three unique features: 1) it enables unified domain adaptive panoptic adaptation; 2) it mitigates false predictions and improves domain adaptive panoptic segmentation effectively; 3) it is end-to-end trainable with a much simpler training and inference pipeline.
arXiv Detail & Related papers (2022-06-30T07:32:23Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z) - Bending Reality: Distortion-aware Transformers for Adapting to Panoramic
Semantic Segmentation [26.09267582056609]
Large quantities of expensive, pixel-wise annotations are crucial for success of robust panoramic segmentation models.
Distortions and the distinct image-feature distribution in 360-degree panoramas impede the transfer from the annotation-rich pinhole domain.
We learn object deformations and panoramic image distortions in Deformable Patch Embedding (DPE) and Deformable Deformable (DMLP) components.
Finally, we tie together shared semantics in pinhole- and panoramic feature embeddings by generating multi-scale prototype features.
arXiv Detail & Related papers (2022-03-02T23:00:32Z) - Transfer beyond the Field of View: Dense Panoramic Semantic Segmentation
via Unsupervised Domain Adaptation [30.104947024614127]
We formalize the task of unsupervised domain adaptation for panoramic semantic segmentation.
DensePASS is a novel dataset for panoramic segmentation under cross-domain conditions.
We introduce P2PDA - a generic framework for Pinhole-to-Panoramic semantic segmentation.
arXiv Detail & Related papers (2021-10-21T11:22:05Z) - DensePASS: Dense Panoramic Semantic Segmentation via Unsupervised Domain
Adaptation with Attention-Augmented Context Exchange [32.29797061415896]
We formalize the task of unsupervised domain adaptation for panoramic semantic segmentation.
A network trained on labelled examples from the source domain of pinhole camera data is deployed in a different target domain of panoramic images.
We build a generic framework for cross-domain panoramic semantic segmentation based on different variants of attention-augmented domain adaptation modules.
arXiv Detail & Related papers (2021-08-13T20:15:46Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.