Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based
Objects
- URL: http://arxiv.org/abs/2209.12880v2
- Date: Wed, 26 Apr 2023 23:55:31 GMT
- Title: Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based
Objects
- Authors: Philip Jacobson, Yiyang Zhou, Wei Zhan, Masayoshi Tomizuka, Ming C. Wu
- Abstract summary: We propose a novel approach for building robust 3D object detection systems for autonomous vehicles.
We leverage center-based detection networks in both the camera and LiDAR streams to identify relevant object locations.
On the nuScenes dataset, we outperform the LiDAR-only baseline by 4.9% mAP while fusing up to 100x fewer features than other fusion methods.
- Score: 26.59231069298659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leveraging multi-modal fusion, especially between camera and LiDAR, has
become essential for building accurate and robust 3D object detection systems
for autonomous vehicles. Until recently, point decorating approaches, in which
point clouds are augmented with camera features, have been the dominant
approach in the field. However, these approaches fail to utilize the higher
resolution images from cameras. Recent works projecting camera features to the
bird's-eye-view (BEV) space for fusion have also been proposed, however they
require projecting millions of pixels, most of which only contain background
information. In this work, we propose a novel approach Center Feature Fusion
(CFF), in which we leverage center-based detection networks in both the camera
and LiDAR streams to identify relevant object locations. We then use the
center-based detection to identify the locations of pixel features relevant to
object locations, a small fraction of the total number in the image. These are
then projected and fused in the BEV frame. On the nuScenes dataset, we
outperform the LiDAR-only baseline by 4.9% mAP while fusing up to 100x fewer
features than other fusion methods.
Related papers
- FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View
Representation [116.6111047218081]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection [5.12292602924464]
This paper proposes a fusion-based 3D object detection network, named Voxel-Pixel Fusion Network (VPFNet)
The proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty.
It is shown to outperform all state-of-the-art methods in mean average precision (mAP)
arXiv Detail & Related papers (2021-11-01T14:17:09Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - CenterFusion: Center-based Radar and Camera Fusion for 3D Object
Detection [8.797434238081372]
We propose a middle-fusion approach to exploit both radar and camera data for 3D object detection.
Our approach, called CenterFusion, first uses a center point detection network to detect objects.
It then solves the key data association problem using a novel frustum-based method.
arXiv Detail & Related papers (2020-11-10T00:20:23Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.