2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
- URL: http://arxiv.org/abs/2207.04397v1
- Date: Sun, 10 Jul 2022 06:52:09 GMT
- Title: 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
- Authors: Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shenghui
Cui, Zhen Li
- Abstract summary: We propose the 2D Priors Assisted Semantic (2DPASS) to boost the representation learning on point clouds.
2DPASS acquires richer semantic and structural information from the multi-modal data, which are then online distilled to the pure 3D network.
It achieves the state-of-the-arts on two large-scale benchmarks.
- Score: 18.321397768570154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As camera and LiDAR sensors capture complementary information used in
autonomous driving, great efforts have been made to develop semantic
segmentation algorithms through multi-modality data fusion. However,
fusion-based approaches require paired data, i.e., LiDAR point clouds and
camera images with strict point-to-pixel mappings, as the inputs in both
training and inference, which seriously hinders their application in practical
scenarios. Thus, in this work, we propose the 2D Priors Assisted Semantic
Segmentation (2DPASS), a general training scheme, to boost the representation
learning on point clouds, by fully taking advantage of 2D images with rich
appearance. In practice, by leveraging an auxiliary modal fusion and
multi-scale fusion-to-single knowledge distillation (MSFSKD), 2DPASS acquires
richer semantic and structural information from the multi-modal data, which are
then online distilled to the pure 3D network. As a result, equipped with
2DPASS, our baseline shows significant improvement with only point cloud
inputs. Specifically, it achieves the state-of-the-arts on two large-scale
benchmarks (i.e. SemanticKITTI and NuScenes), including top-1 results in both
single and multiple scan(s) competitions of SemanticKITTI.
Related papers
- Cross-Modal Information-Guided Network using Contrastive Learning for
Point Cloud Registration [17.420425069785946]
We present a novel Cross-Modal Information-Guided Network (CMIGNet) for point cloud registration.
We first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism.
We employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning.
arXiv Detail & Related papers (2023-11-02T12:56:47Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - 3D Point Cloud Pre-training with Knowledge Distillation from 2D Images [128.40422211090078]
We propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model.
Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images.
In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models.
arXiv Detail & Related papers (2022-12-17T23:21:04Z) - LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for
Autonomous Driving [34.119642131912485]
We present a more artful framework, LiDAR-guided Weakly Supervised Instance (LWSIS)
LWSIS uses the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the cost of the dense 2D masks.
arXiv Detail & Related papers (2022-12-07T08:08:01Z) - SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for
Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU.
Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module.
To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.