360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes
- URL: http://arxiv.org/abs/2404.16501v1
- Date: Thu, 25 Apr 2024 10:52:08 GMT
- Title: 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes
- Authors: Xu Zheng, Pengyuan Zhou, Athanasios V. Vasilakos, Lin Wang,
- Abstract summary: We address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation.
360SFUDA++ effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images.
- Score: 15.367186190755003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RP$^2$AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.
Related papers
- Multi-source Domain Adaptation for Panoramic Semantic Segmentation [22.367890439050786]
We propose a new task of multi-source domain adaptation for panoramic semantic segmentation.
We aim to utilize both real pinhole synthetic panoramic images in the source domains, enabling the segmentation model to perform well on unlabeled real panoramic images.
DTA4PASS converts all pinhole images in the source domains into panoramic-like images, and then aligns the converted source domains with the target domain.
arXiv Detail & Related papers (2024-08-29T12:00:11Z) - Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation [15.367186190755003]
This paper addresses a problem -- source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation.
Tackling this problem is nontrivial due to the semantic mismatches, style discrepancies, and inevitable distortion of panoramic images.
We propose a novel method that utilizes Tangent Projection (TP) as it has less distortion and slits the equirectangular projection (ERP) with a fixed FoV to mimic the pinhole images.
arXiv Detail & Related papers (2024-03-19T07:11:53Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Effective Adapter for Face Recognition in the Wild [72.75516495170199]
We tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions.
Traditional approaches-either training models directly on degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective.
We propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets.
arXiv Detail & Related papers (2023-12-04T08:55:46Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Both Style and Distortion Matter: Dual-Path Unsupervised Domain
Adaptation for Panoramic Semantic Segmentation [4.566642023113164]
The ability of scene understanding has sparked active research for panoramic image semantic segmentation.
Some works treat the equirectangular projection (ERP) and pinhole images equally and transfer knowledge from the pinhole to ERP images via unsupervised domain adaptation (UDA)
We propose a novel yet flexible dual-path UDA framework, DPPASS, taking ERP and tangent projection (TP) images as inputs.
arXiv Detail & Related papers (2023-03-25T04:57:45Z) - Towards Domain Generalization for Multi-view 3D Object Detection in
Bird-Eye-View [11.958753088613637]
We first analyze the causes of the domain gap for the MV3D-Det task.
To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera.
We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
arXiv Detail & Related papers (2023-03-03T02:59:13Z) - PC-GANs: Progressive Compensation Generative Adversarial Networks for
Pan-sharpening [50.943080184828524]
We propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information.
The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously.
arXiv Detail & Related papers (2022-07-29T03:09:21Z) - Transfer beyond the Field of View: Dense Panoramic Semantic Segmentation
via Unsupervised Domain Adaptation [30.104947024614127]
We formalize the task of unsupervised domain adaptation for panoramic semantic segmentation.
DensePASS is a novel dataset for panoramic segmentation under cross-domain conditions.
We introduce P2PDA - a generic framework for Pinhole-to-Panoramic semantic segmentation.
arXiv Detail & Related papers (2021-10-21T11:22:05Z) - Light Field Saliency Detection with Dual Local Graph Learning
andReciprocative Guidance [148.9832328803202]
We model the infor-mation fusion within focal stack via graph networks.
We build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
arXiv Detail & Related papers (2021-10-02T00:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.