FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
- URL: http://arxiv.org/abs/2203.10642v2
- Date: Sat, 15 Apr 2023 13:05:44 GMT
- Title: FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
- Authors: Xuanyao Chen, Tianyuan Zhang, Yue Wang, Yilun Wang, Hang Zhao
- Abstract summary: We propose the first unified end-to-end sensor fusion framework for 3D detection, namedR3D, which can be used in almost any sensor configuration.
R3D employs a query-based Modality-Agnostic Feature Sampler (MAFS), together with a transformer decoder with a set-to-set loss for 3D detection.
On NuScenes dataset,R3D achieves better performance over specifically designed methods across different sensor combinations.
- Score: 18.70932813595532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sensor fusion is an essential topic in many perception systems, such as
autonomous driving and robotics. Existing multi-modal 3D detection models
usually involve customized designs depending on the sensor combinations or
setups. In this work, we propose the first unified end-to-end sensor fusion
framework for 3D detection, named FUTR3D, which can be used in (almost) any
sensor configuration. FUTR3D employs a query-based Modality-Agnostic Feature
Sampler (MAFS), together with a transformer decoder with a set-to-set loss for
3D detection, thus avoiding using late fusion heuristics and post-processing
tricks. We validate the effectiveness of our framework on various combinations
of cameras, low-resolution LiDARs, high-resolution LiDARs, and Radars. On
NuScenes dataset, FUTR3D achieves better performance over specifically designed
methods across different sensor combinations. Moreover, FUTR3D achieves great
flexibility with different sensor configurations and enables low-cost
autonomous driving. For example, only using a 4-beam LiDAR with cameras, FUTR3D
(58.0 mAP) achieves on par performance with state-of-the-art 3D detection model
CenterPoint (56.6 mAP) using a 32-beam LiDAR.
Related papers
- Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and
Spatio-Temporal Affinities for 3D Multi-Object Tracking [26.976216624424385]
3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene.
We aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information.
arXiv Detail & Related papers (2023-10-04T02:17:59Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object
Detection [20.44294678711783]
We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds.
First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features.
Second, we raise a direct set prediction problem that allows designing an effective set-based detector.
arXiv Detail & Related papers (2022-11-17T13:31:23Z) - HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object
Detection [0.0]
We propose HRFuser, a modular architecture for multi-modal 2D object detection.
It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities.
We demonstrate via experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities.
arXiv Detail & Related papers (2022-06-30T09:40:05Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.