CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
- URL: http://arxiv.org/abs/2009.00784v1
- Date: Wed, 2 Sep 2020 02:07:00 GMT
- Title: CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
- Authors: Su Pang, Daniel Morris, Hayder Radha
- Abstract summary: We propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network.
CLOCs fusion provides a low-complexity multi-modal fusion framework.
We show that CLOCs ranks the highest among all the fusion-based methods in the official KITTI leaderboard.
- Score: 13.986963122264633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There have been significant advances in neural networks for both 3D object
detection using LiDAR and 2D object detection using video. However, it has been
surprisingly difficult to train networks to effectively use both modalities in
a way that demonstrates gain over single-modality networks. In this paper, we
propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network. CLOCs
fusion provides a low-complexity multi-modal fusion framework that
significantly improves the performance of single-modality detectors. CLOCs
operates on the combined output candidates before Non-Maximum Suppression (NMS)
of any 2D and any 3D detector, and is trained to leverage their geometric and
semantic consistencies to produce more accurate final 3D and 2D detection
results. Our experimental evaluation on the challenging KITTI object detection
benchmark, including 3D and bird's eye view metrics, shows significant
improvements, especially at long distance, over the state-of-the-art fusion
based methods. At time of submission, CLOCs ranks the highest among all the
fusion-based methods in the official KITTI leaderboard. We will release our
code upon acceptance.
Related papers
- Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving.
Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models.
Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z) - Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection [11.575945934519442]
LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving.
Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds.
We propose a general multi-modal fusion framework Multi-Sem Fusion (MSF) to fuse the semantic information from both the 2D image and 3D points scene parsing results.
arXiv Detail & Related papers (2022-12-10T10:54:41Z) - MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous
Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy.
Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes.
MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.