Related papers: FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection

FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection

URL: http://arxiv.org/abs/2209.10733v1
Date: Thu, 22 Sep 2022 02:07:25 GMT
Title: FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection
Authors: Xinli Xu, Shaocong Dong, Lihe Ding, Jie Wang, Tingfa Xu, Jianan Li
Abstract summary: Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm. The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects. We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI) FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
Score: 11.962073589763676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D object detection with multi-sensors is essential for an accurate and reliable perception system of autonomous driving and robotics. Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm which merely relies on LiDAR point clouds for 3D proposal refinement. Though impressive, the sparsity of point clouds, especially for the points far away, making it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.To address this problem, we propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI). FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from camera in a unified attention mechanism. Specifically, it first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step; then leverages an intra-modality self-attention to enhance the domain-specific features, following by a well-designed cross-attention to fuse the information from two modalities.FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors.Remarkably, FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on Waymo, and outperforms competing two-stage approaches. Code will be released soon at https://github.com/xxlbigbrother/Fusion-RCNN.

Related papers

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection [19.419030878019974]
unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers. The corresponding indexes between different sensor signals are established in advance in the data preprocessing. Two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed.
arXiv Detail & Related papers (2022-09-15T16:13:19Z)
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion. We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z)
EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module. The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner. The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z)
MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z)
FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection [15.641616738865276]
We propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task. Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector. The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark.
arXiv Detail & Related papers (2021-06-23T14:53:22Z)
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds. Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z)
Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection. We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z)
Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection. The whole architecture facilitates two-stage fusion. Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.