CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge
Distillation for LIDAR Semantic Segmentation
- URL: http://arxiv.org/abs/2307.04091v1
- Date: Sun, 9 Jul 2023 04:24:12 GMT
- Title: CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge
Distillation for LIDAR Semantic Segmentation
- Authors: Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo,
Yingya Zhang, Qifeng Chen
- Abstract summary: 2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles.
Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems.
We propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation (CMDFusion) in this work.
- Score: 44.44327357717908
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 2D RGB images and 3D LIDAR point clouds provide complementary knowledge for
the perception system of autonomous vehicles. Several 2D and 3D fusion methods
have been explored for the LIDAR semantic segmentation task, but they suffer
from different problems. 2D-to-3D fusion methods require strictly paired data
during inference, which may not be available in real-world scenarios, while
3D-to-2D fusion methods cannot explicitly make full use of the 2D information.
Therefore, we propose a Bidirectional Fusion Network with Cross-Modality
Knowledge Distillation (CMDFusion) in this work. Our method has two
contributions. First, our bidirectional fusion scheme explicitly and implicitly
enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively,
which surpasses either one of the single fusion schemes. Second, we distillate
the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D
knowledge branch) so that the 3D network can generate 2D information even for
those points not in the FOV (field of view) of the camera. In this way, RGB
images are not required during inference anymore since the 2D knowledge branch
provides 2D information according to the 3D LIDAR input. We show that our
CMDFusion achieves the best performance among all fusion-based methods on
SemanticKITTI and nuScenes datasets. The code will be released at
https://github.com/Jun-CEN/CMDFusion.
Related papers
- Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic
Segmentation [82.47872784972861]
Cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain.
This paper studies a new 3DLSS setting where a 2D dataset with semantic annotations and a paired but unannotated 2D image and 3D LiDAR data (target) are available.
To achieve 3DLSS in this scenario, we propose Cross-Modal and Cross-Domain Learning (CoMoDaL)
arXiv Detail & Related papers (2023-08-05T14:00:05Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection [11.575945934519442]
LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving.
Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds.
We propose a general multi-modal fusion framework Multi-Sem Fusion (MSF) to fuse the semantic information from both the 2D image and 3D points scene parsing results.
arXiv Detail & Related papers (2022-12-10T10:54:41Z) - Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection [0.0]
We propose a novel data fusion algorithm to combine accurate point clouds with dense but less accurate point clouds obtained from stereo pairs.
We train multiple 3D object detection methods and show that our fusion strategy consistently improves the performance of detectors.
arXiv Detail & Related papers (2021-11-08T19:29:59Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.