BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain
Generalization of 3D Semantic Segmentation
- URL: http://arxiv.org/abs/2308.06530v1
- Date: Sat, 12 Aug 2023 11:09:17 GMT
- Title: BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain
Generalization of 3D Semantic Segmentation
- Authors: Miaoyu Li, Yachao Zhang, Xu MA, Yanyun Qu, Yun Fu
- Abstract summary: Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the complementarity of 2D-3D data to overcome the lack of annotation in a new domain.
We propose cross-modal learning under bird's-eye view for Domain Generalization (DG) of 3D semantic segmentation, called BEV-DG.
- Score: 59.99683295806698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the
complementarity of 2D-3D data to overcome the lack of annotation in a new
domain. However, UDA methods rely on access to the target domain during
training, meaning the trained model only works in a specific target domain. In
light of this, we propose cross-modal learning under bird's-eye view for Domain
Generalization (DG) of 3D semantic segmentation, called BEV-DG. DG is more
challenging because the model cannot access the target domain during training,
meaning it needs to rely on cross-modal learning to alleviate the domain gap.
Since 3D semantic segmentation requires the classification of each point,
existing cross-modal learning is directly conducted point-to-point, which is
sensitive to the misalignment in projections between pixels and points. To this
end, our approach aims to optimize domain-irrelevant representation modeling
with the aid of cross-modal learning under bird's-eye view. We propose
BEV-based Area-to-area Fusion (BAF) to conduct cross-modal learning under
bird's-eye view, which has a higher fault tolerance for point-level
misalignment. Furthermore, to model domain-irrelevant representations, we
propose BEV-driven Domain Contrastive Learning (BDCL) with the help of
cross-modal learning under bird's-eye view. We design three domain
generalization settings based on three 3D datasets, and BEV-DG significantly
outperforms state-of-the-art competitors with tremendous margins in all
settings.
Related papers
- Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation [17.875516787157018]
We study how to harness the knowledge priors learned by 2D visual foundation models to produce more accurate labels for unlabeled target domains.
Our method is evaluated on various autonomous driving datasets and the results demonstrate a significant improvement for 3D segmentation task.
arXiv Detail & Related papers (2024-03-15T03:58:17Z) - CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D
Object Detection [14.063365469339812]
LiDAR-based 3D Object Detection methods often do not generalize well to target domains outside the source (or training) data distribution.
We introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which leverages visual semantic cues from an image modality.
We also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features.
arXiv Detail & Related papers (2024-03-06T14:12:38Z) - Domain Adaptive and Generalizable Network Architectures and Training
Strategies for Semantic Image Segmentation [108.33885637197614]
Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or unseen target domains.
We propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention.
arXiv Detail & Related papers (2023-04-26T15:18:45Z) - Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection [32.29833072399945]
We propose a Bi-domain active learning approach, namely Bi3D, to solve the cross-domain 3D object detection task.
Bi3D achieves a promising target-domain detection accuracy (89.63% on KITTI) compared with UDAbased work (84.29%), even surpassing the detector trained on the full set of the labeled target domain.
arXiv Detail & Related papers (2023-03-10T12:38:37Z) - Geometry-Aware Network for Domain Adaptive Semantic Segmentation [64.00345743710653]
We propose a novel Geometry-Aware Network for Domain Adaptation (GANDA) to shrink the domain gaps.
We exploit 3D topology on the point clouds generated from RGB-D images for coordinate-color disentanglement and pseudo-labels refinement in the target domain.
Our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
arXiv Detail & Related papers (2022-12-02T00:48:44Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal
Learning in Domain Adaptation for 3D Semantic Segmentation [46.110739803985076]
We propose Dynamic sparse-to-dense Cross Modal Learning (DsCML) to increase the sufficiency of multi-modality information interaction for domain adaptation.
For inter-domain cross modal learning, we further advance Cross Modal Adversarial Learning (CMAL) on 2D and 3D data.
We evaluate our model under various multi-modality domain adaptation settings including day-to-night, country-to-country and dataset-to-dataset.
arXiv Detail & Related papers (2021-07-30T15:55:55Z) - Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency [90.71745178767203]
Deep learning-based 3D object detection has achieved unprecedented success with the advent of large-scale autonomous driving datasets.
Existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.
We study a more realistic setting, unsupervised 3D domain adaptive detection, which only utilizes source domain annotations.
arXiv Detail & Related papers (2021-07-23T17:19:23Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.