TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers
- URL: http://arxiv.org/abs/2203.11496v1
- Date: Tue, 22 Mar 2022 07:15:13 GMT
- Title: TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers
- Authors: Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu,
Chiew-Lan Tai
- Abstract summary: We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
- Score: 49.689566246504356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR and camera are two important sensors for 3D object detection in
autonomous driving. Despite the increasing popularity of sensor fusion in this
field, the robustness against inferior image conditions, e.g., bad illumination
and sensor misalignment, is under-explored. Existing fusion methods are easily
affected by such conditions, mainly due to a hard association of LiDAR points
and image pixels, established by calibration matrices. We propose TransFusion,
a robust solution to LiDAR-camera fusion with a soft-association mechanism to
handle inferior image conditions. Specifically, our TransFusion consists of
convolutional backbones and a detection head based on a transformer decoder.
The first layer of the decoder predicts initial bounding boxes from a LiDAR
point cloud using a sparse set of object queries, and its second decoder layer
adaptively fuses the object queries with useful image features, leveraging both
spatial and contextual relationships. The attention mechanism of the
transformer enables our model to adaptively determine where and what
information should be taken from the image, leading to a robust and effective
fusion strategy. We additionally design an image-guided query initialization
strategy to deal with objects that are difficult to detect in point clouds.
TransFusion achieves state-of-the-art performance on large-scale datasets. We
provide extensive experiments to demonstrate its robustness against degenerated
image quality and calibration errors. We also extend the proposed method to the
3D tracking task and achieve the 1st place in the leaderboard of nuScenes
tracking, showing its effectiveness and generalization capability.
Related papers
- Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision
Transformer Fusion [8.168523242105763]
We will introduce a novel vision transformer-based 3D object detection model, namely FusionViT.
Our FusionViT model can achieve state-of-the-art performance and outperforms existing baseline methods.
arXiv Detail & Related papers (2023-11-07T00:12:01Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object
Detection [13.986963122264633]
TransCAR is a Transformer-based Camera-And-Radar fusion solution for 3D object detection.
Our model estimates a bounding box per query using set-to-set Hungarian loss.
arXiv Detail & Related papers (2023-04-30T05:35:03Z) - SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye
View Representation for 3D Object Detection [14.706717531900708]
LiDAR and camera are two essential sensors for 3D object detection in autonomous driving.
Recent methods focus on point-level fusion which paints the LiDAR point cloud with camera features in the perspective view.
We present SemanticBEVFusion to deeply fuse camera features with LiDAR features in a unified BEV representation.
arXiv Detail & Related papers (2022-12-09T05:48:58Z) - 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object
Detection [13.068266058374775]
We propose a novel camera-LiDAR fusion architecture called 3D Dual-Fusion.
The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention.
The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-11-24T11:00:50Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.