Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic
Segmentation
- URL: http://arxiv.org/abs/2308.02883v1
- Date: Sat, 5 Aug 2023 14:00:05 GMT
- Title: Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic
Segmentation
- Authors: Yiyang Chen, Shanshan Zhao, Changxing Ding, Liyao Tang, Chaoyue Wang,
Dacheng Tao
- Abstract summary: Cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain.
This paper studies a new 3DLSS setting where a 2D dataset with semantic annotations and a paired but unannotated 2D image and 3D LiDAR data (target) are available.
To achieve 3DLSS in this scenario, we propose Cross-Modal and Cross-Domain Learning (CoMoDaL)
- Score: 82.47872784972861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, cross-modal domain adaptation has been studied on the paired
2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic
segmentation (3DLSS) in the target domain. However, in such a setting the
paired 2D and 3D data in the source domain are still collected with additional
effort. Since the 2D-3D projections can enable the 3D model to learn semantic
information from the 2D counterpart, we ask whether we could further remove the
need of source 3D data and only rely on the source 2D images. To answer it,
this paper studies a new 3DLSS setting where a 2D dataset (source) with
semantic annotations and a paired but unannotated 2D image and 3D LiDAR data
(target) are available. To achieve 3DLSS in this scenario, we propose
Cross-Modal and Cross-Domain Learning (CoMoDaL). Specifically, our CoMoDaL aims
at modeling 1) inter-modal cross-domain distillation between the unpaired
source 2D image and target 3D LiDAR data, and 2) the intra-domain cross-modal
guidance between the target 2D image and 3D LiDAR data pair. In CoMoDaL, we
propose to apply several constraints, such as point-to-pixel and
prototype-to-pixel alignments, to associate the semantics in different
modalities and domains by constructing mixed samples in two modalities. The
experimental results on several datasets show that in the proposed setting, the
developed CoMoDaL can achieve segmentation without the supervision of labeled
LiDAR data. Ablations are also conducted to provide more analysis. Code will be
available publicly.
Related papers
- LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation [68.60747298865394]
We propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D)
Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data.
This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis.
arXiv Detail & Related papers (2024-06-03T02:57:25Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for
Autonomous Driving [34.119642131912485]
We present a more artful framework, LiDAR-guided Weakly Supervised Instance (LWSIS)
LWSIS uses the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the cost of the dense 2D masks.
arXiv Detail & Related papers (2022-12-07T08:08:01Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal
Learning in Domain Adaptation for 3D Semantic Segmentation [46.110739803985076]
We propose Dynamic sparse-to-dense Cross Modal Learning (DsCML) to increase the sufficiency of multi-modality information interaction for domain adaptation.
For inter-domain cross modal learning, we further advance Cross Modal Adversarial Learning (CMAL) on 2D and 3D data.
We evaluate our model under various multi-modality domain adaptation settings including day-to-night, country-to-country and dataset-to-dataset.
arXiv Detail & Related papers (2021-07-30T15:55:55Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.