TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers
- URL: http://arxiv.org/abs/2203.11496v1
- Date: Tue, 22 Mar 2022 07:15:13 GMT
- Title: TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers
- Authors: Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu,
Chiew-Lan Tai
- Abstract summary: We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
- Score: 49.689566246504356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR and camera are two important sensors for 3D object detection in
autonomous driving. Despite the increasing popularity of sensor fusion in this
field, the robustness against inferior image conditions, e.g., bad illumination
and sensor misalignment, is under-explored. Existing fusion methods are easily
affected by such conditions, mainly due to a hard association of LiDAR points
and image pixels, established by calibration matrices. We propose TransFusion,
a robust solution to LiDAR-camera fusion with a soft-association mechanism to
handle inferior image conditions. Specifically, our TransFusion consists of
convolutional backbones and a detection head based on a transformer decoder.
The first layer of the decoder predicts initial bounding boxes from a LiDAR
point cloud using a sparse set of object queries, and its second decoder layer
adaptively fuses the object queries with useful image features, leveraging both
spatial and contextual relationships. The attention mechanism of the
transformer enables our model to adaptively determine where and what
information should be taken from the image, leading to a robust and effective
fusion strategy. We additionally design an image-guided query initialization
strategy to deal with objects that are difficult to detect in point clouds.
TransFusion achieves state-of-the-art performance on large-scale datasets. We
provide extensive experiments to demonstrate its robustness against degenerated
image quality and calibration errors. We also extend the proposed method to the
3D tracking task and achieve the 1st place in the leaderboard of nuScenes
tracking, showing its effectiveness and generalization capability.
Related papers
- Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object
Detection [13.986963122264633]
TransCAR is a Transformer-based Camera-And-Radar fusion solution for 3D object detection.
Our model estimates a bounding box per query using set-to-set Hungarian loss.
arXiv Detail & Related papers (2023-04-30T05:35:03Z) - 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object
Detection [13.068266058374775]
We propose a novel camera-LiDAR fusion architecture called 3D Dual-Fusion.
The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention.
The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-11-24T11:00:50Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.