Multi-View Adaptive Fusion Network for 3D Object Detection
- URL: http://arxiv.org/abs/2011.00652v2
- Date: Tue, 8 Dec 2020 03:54:51 GMT
- Title: Multi-View Adaptive Fusion Network for 3D Object Detection
- Authors: Guojun Wang, Bin Tian, Yachen Zhang, Long Chen, Dongpu Cao, Jian Wu
- Abstract summary: 3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
- Score: 14.506796247331584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection based on LiDAR-camera fusion is becoming an emerging
research theme for autonomous driving. However, it has been surprisingly
difficult to effectively fuse both modalities without information loss and
interference. To solve this issue, we propose a single-stage multi-view fusion
framework that takes LiDAR bird's-eye view, LiDAR range view and camera view
images as inputs for 3D object detection. To effectively fuse multi-view
features, we propose an attentive pointwise fusion (APF) module to estimate the
importance of the three sources with attention mechanisms that can achieve
adaptive fusion of multi-view features in a pointwise manner. Furthermore, an
attentive pointwise weighting (APW) module is designed to help the network
learn structure information and point feature importance with two extra tasks,
namely, foreground classification and center regression, and the predicted
foreground probability is used to reweight the point features. We design an
end-to-end learnable network named MVAF-Net to integrate these two components.
Our evaluations conducted on the KITTI 3D object detection datasets demonstrate
that the proposed APF and APW modules offer significant performance gains.
Moreover, the proposed MVAF-Net achieves the best performance among all
single-stage fusion methods and outperforms most two-stage fusion methods,
achieving the best trade-off between speed and accuracy on the KITTI benchmark.
Related papers
- MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a
Single Image [37.83574424518901]
3D object detection from a single image is an important task in Autonomous Driving.
We propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection.
arXiv Detail & Related papers (2021-03-05T05:47:52Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.