Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based
Objects
- URL: http://arxiv.org/abs/2209.12880v2
- Date: Wed, 26 Apr 2023 23:55:31 GMT
- Title: Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based
Objects
- Authors: Philip Jacobson, Yiyang Zhou, Wei Zhan, Masayoshi Tomizuka, Ming C. Wu
- Abstract summary: We propose a novel approach for building robust 3D object detection systems for autonomous vehicles.
We leverage center-based detection networks in both the camera and LiDAR streams to identify relevant object locations.
On the nuScenes dataset, we outperform the LiDAR-only baseline by 4.9% mAP while fusing up to 100x fewer features than other fusion methods.
- Score: 26.59231069298659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leveraging multi-modal fusion, especially between camera and LiDAR, has
become essential for building accurate and robust 3D object detection systems
for autonomous vehicles. Until recently, point decorating approaches, in which
point clouds are augmented with camera features, have been the dominant
approach in the field. However, these approaches fail to utilize the higher
resolution images from cameras. Recent works projecting camera features to the
bird's-eye-view (BEV) space for fusion have also been proposed, however they
require projecting millions of pixels, most of which only contain background
information. In this work, we propose a novel approach Center Feature Fusion
(CFF), in which we leverage center-based detection networks in both the camera
and LiDAR streams to identify relevant object locations. We then use the
center-based detection to identify the locations of pixel features relevant to
object locations, a small fraction of the total number in the image. These are
then projected and fused in the BEV frame. On the nuScenes dataset, we
outperform the LiDAR-only baseline by 4.9% mAP while fusing up to 100x fewer
features than other fusion methods.
Related papers
- FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation [48.828453331724965]
We propose an Omni-Aperture Fusion model (OAFuser) to extract angular information from sub-aperture images to generate semantically consistent results.
The proposed OAFuser achieves state-of-the-art performance on four UrbanLF datasets in terms of all evaluation metrics.
arXiv Detail & Related papers (2023-07-28T14:43:27Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection [5.12292602924464]
This paper proposes a fusion-based 3D object detection network, named Voxel-Pixel Fusion Network (VPFNet)
The proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty.
It is shown to outperform all state-of-the-art methods in mean average precision (mAP)
arXiv Detail & Related papers (2021-11-01T14:17:09Z) - CenterFusion: Center-based Radar and Camera Fusion for 3D Object
Detection [8.797434238081372]
We propose a middle-fusion approach to exploit both radar and camera data for 3D object detection.
Our approach, called CenterFusion, first uses a center point detection network to detect objects.
It then solves the key data association problem using a novel frustum-based method.
arXiv Detail & Related papers (2020-11-10T00:20:23Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.