From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera
Fusion
- URL: http://arxiv.org/abs/2209.12254v1
- Date: Sun, 25 Sep 2022 16:10:14 GMT
- Title: From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera
Fusion
- Authors: Rui Wan, Shuangjie Xu, Wei Wu, Xiaoyi Zou, Tongyi Cao
- Abstract summary: Existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration.
We propose a Dynamic Cross Attention (DCA) module with a novel one-to-many cross-modality mapping.
The whole fusion architecture named Dynamic Cross Attention Network (DCAN) exploits multi-level image features and adapts to multiple representations of point clouds.
- Score: 12.792769704561024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR and cameras are two complementary sensors for 3D perception in
autonomous driving. LiDAR point clouds have accurate spatial and geometry
information, while RGB images provide textural and color data for context
reasoning. To exploit LiDAR and cameras jointly, existing fusion methods tend
to align each 3D point to only one projected image pixel based on calibration,
namely one-to-one mapping. However, the performance of these approaches highly
relies on the calibration quality, which is sensitive to the temporal and
spatial synchronization of sensors. Therefore, we propose a Dynamic Cross
Attention (DCA) module with a novel one-to-many cross-modality mapping that
learns multiple offsets from the initial projection towards the neighborhood
and thus develops tolerance to calibration error. Moreover, a \textit{dynamic
query enhancement} is proposed to perceive the model-independent calibration,
which further strengthens DCA's tolerance to the initial misalignment. The
whole fusion architecture named Dynamic Cross Attention Network (DCAN) exploits
multi-level image features and adapts to multiple representations of point
clouds, which allows DCA to serve as a plug-in fusion module. Extensive
experiments on nuScenes and KITTI prove DCA's effectiveness. The proposed DCAN
outperforms state-of-the-art methods on the nuScenes detection challenge.
Related papers
- LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation.
Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z) - LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and
Semantic-Aware Alignment [63.83894701779067]
We propose LCPS, the first LiDAR-Camera Panoptic network.
In our approach, we conduct LiDAR-Camera fusion in three stages.
Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset.
arXiv Detail & Related papers (2023-08-03T10:57:58Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object
Detection [20.44294678711783]
We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds.
First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features.
Second, we raise a direct set prediction problem that allows designing an effective set-based detector.
arXiv Detail & Related papers (2022-11-17T13:31:23Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D
Object Detection [19.419030878019974]
unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers.
The corresponding indexes between different sensor signals are established in advance in the data preprocessing.
Two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed.
arXiv Detail & Related papers (2022-09-15T16:13:19Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - Spatiotemporal Camera-LiDAR Calibration: A Targetless and Structureless
Approach [32.15405927679048]
We propose a targetless and structureless camera-DAR calibration method.
Our method combines a closed-form solution with a structureless bundle where the coarse-to-fine approach does not require an initial adjustment on the temporal parameters.
We demonstrate the accuracy and robustness of the proposed method through both simulation and real data experiments.
arXiv Detail & Related papers (2020-01-17T07:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.