Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection
- URL: http://arxiv.org/abs/2205.14951v1
- Date: Mon, 30 May 2022 09:35:37 GMT
- Title: Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection
- Authors: Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu
Xia, Tingting Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang,
Xiaodan Liang, Bing Wang
- Abstract summary: Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
- Score: 58.81316192862618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are two critical sensors for 3D perception in autonomous driving, the
camera and the LiDAR. The camera provides rich semantic information such as
color, texture, and the LiDAR reflects the 3D shape and locations of
surrounding objects. People discover that fusing these two modalities can
significantly boost the performance of 3D perception models as each modality
has complementary information to the other. However, we observe that current
datasets are captured from expensive vehicles that are explicitly designed for
data collection purposes, and cannot truly reflect the realistic data
distribution due to various reasons. To this end, we collect a series of
real-world cases with noisy data distribution, and systematically formulate a
robustness benchmark toolkit, that simulates these cases on any clean
autonomous driving datasets. We showcase the effectiveness of our toolkit by
establishing the robustness benchmark on two widely-adopted autonomous driving
datasets, nuScenes and Waymo, then, to the best of our knowledge, holistically
benchmark the state-of-the-art fusion methods for the first time. We observe
that: i) most fusion methods, when solely developed on these data, tend to fail
inevitably when there is a disruption to the LiDAR input; ii) the improvement
of the camera input is significantly inferior to the LiDAR one. We further
propose an efficient robust training strategy to improve the robustness of the
current fusion method. The benchmark and code are available at
https://github.com/kcyu2014/lidar-camera-robust-benchmark
Related papers
- Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and
Spatio-Temporal Affinities for 3D Multi-Object Tracking [26.976216624424385]
3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene.
We aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information.
arXiv Detail & Related papers (2023-10-04T02:17:59Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye
View Representation for 3D Object Detection [14.706717531900708]
LiDAR and camera are two essential sensors for 3D object detection in autonomous driving.
Recent methods focus on point-level fusion which paints the LiDAR point cloud with camera features in the perspective view.
We present SemanticBEVFusion to deeply fuse camera features with LiDAR features in a unified BEV representation.
arXiv Detail & Related papers (2022-12-09T05:48:58Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations.
We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions.
We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z) - Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection [0.0]
We propose a novel data fusion algorithm to combine accurate point clouds with dense but less accurate point clouds obtained from stereo pairs.
We train multiple 3D object detection methods and show that our fusion strategy consistently improves the performance of detectors.
arXiv Detail & Related papers (2021-11-08T19:29:59Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.