Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal
Learning in Domain Adaptation for 3D Semantic Segmentation
- URL: http://arxiv.org/abs/2107.14724v2
- Date: Mon, 2 Aug 2021 04:40:39 GMT
- Title: Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal
Learning in Domain Adaptation for 3D Semantic Segmentation
- Authors: Duo Peng, Yinjie Lei, Wen Li, Pingping Zhang and Yulan Guo
- Abstract summary: We propose Dynamic sparse-to-dense Cross Modal Learning (DsCML) to increase the sufficiency of multi-modality information interaction for domain adaptation.
For inter-domain cross modal learning, we further advance Cross Modal Adversarial Learning (CMAL) on 2D and 3D data.
We evaluate our model under various multi-modality domain adaptation settings including day-to-night, country-to-country and dataset-to-dataset.
- Score: 46.110739803985076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain adaptation is critical for success when confronting with the lack of
annotations in a new domain. As the huge time consumption of labeling process
on 3D point cloud, domain adaptation for 3D semantic segmentation is of great
expectation. With the rise of multi-modal datasets, large amount of 2D images
are accessible besides 3D point clouds. In light of this, we propose to further
leverage 2D data for 3D domain adaptation by intra and inter domain cross modal
learning. As for intra-domain cross modal learning, most existing works sample
the dense 2D pixel-wise features into the same size with sparse 3D point-wise
features, resulting in the abandon of numerous useful 2D features. To address
this problem, we propose Dynamic sparse-to-dense Cross Modal Learning (DsCML)
to increase the sufficiency of multi-modality information interaction for
domain adaptation. For inter-domain cross modal learning, we further advance
Cross Modal Adversarial Learning (CMAL) on 2D and 3D data which contains
different semantic content aiming to promote high-level modal complementarity.
We evaluate our model under various multi-modality domain adaptation settings
including day-to-night, country-to-country and dataset-to-dataset, brings large
improvements over both uni-modal and multi-modal domain adaptation methods on
all settings.
Related papers
- One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection [71.78795573911512]
We propose textbfOneDet3D, a universal one-for-all model that addresses 3D detection across different domains.
We propose the domain-aware in scatter and context, guided by a routing mechanism, to address the data interference issue.
The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities.
arXiv Detail & Related papers (2024-11-03T14:21:56Z) - LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain
Generalization of 3D Semantic Segmentation [59.99683295806698]
Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the complementarity of 2D-3D data to overcome the lack of annotation in a new domain.
We propose cross-modal learning under bird's-eye view for Domain Generalization (DG) of 3D semantic segmentation, called BEV-DG.
arXiv Detail & Related papers (2023-08-12T11:09:17Z) - Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic
Segmentation [82.47872784972861]
Cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain.
This paper studies a new 3DLSS setting where a 2D dataset with semantic annotations and a paired but unannotated 2D image and 3D LiDAR data (target) are available.
To achieve 3DLSS in this scenario, we propose Cross-Modal and Cross-Domain Learning (CoMoDaL)
arXiv Detail & Related papers (2023-08-05T14:00:05Z) - Exploiting the Complementarity of 2D and 3D Networks to Address
Domain-Shift in 3D Semantic Segmentation [14.30113021974841]
3D semantic segmentation is a critical task in many real-world applications, such as autonomous driving, robotics, and mixed reality.
A possible solution is to combine the 3D information with others coming from sensors featuring a different modality, such as RGB cameras.
Recent multi-modal 3D semantic segmentation networks exploit these modalities relying on two branches that process the 2D and 3D information independently.
arXiv Detail & Related papers (2023-04-06T10:59:43Z) - SSDA3D: Semi-supervised Domain Adaptation for 3D Object Detection from
Point Cloud [125.9472454212909]
We present a novel Semi-Supervised Domain Adaptation method for 3D object detection (SSDA3D)
SSDA3D includes an Inter-domain Adaptation stage and an Intra-domain Generalization stage.
Experiments show that, with only 10% labeled target data, our SSDA3D can surpass the fully-supervised oracle model with 100% target label.
arXiv Detail & Related papers (2022-12-06T09:32:44Z) - Multimodal Semi-Supervised Learning for 3D Objects [19.409295848915388]
This paper explores how the coherence of different modelities of 3D data can be used to improve data efficiency for both 3D classification and retrieval tasks.
We propose a novel multimodal semi-supervised learning framework by introducing instance-level consistency constraint and a novel multimodal contrastive prototype (M2CP) loss.
Our proposed framework significantly outperforms all the state-of-the-art counterparts for both classification and retrieval tasks by a large margin on the modelNet10 and ModelNet40 datasets.
arXiv Detail & Related papers (2021-10-22T05:33:16Z) - Self-supervised Feature Learning by Cross-modality and Cross-view
Correspondences [32.01548991331616]
This paper presents a novel self-supervised learning approach to learn both 2D image features and 3D point cloud features.
It exploits cross-modality and cross-view correspondences without using any annotated human labels.
The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks.
arXiv Detail & Related papers (2020-04-13T02:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.