3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View
Spatial Feature Fusion for 3D Object Detection
- URL: http://arxiv.org/abs/2004.12636v2
- Date: Tue, 21 Jul 2020 03:00:03 GMT
- Title: 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View
Spatial Feature Fusion for 3D Object Detection
- Authors: Jin Hyeok Yoo and Yecheol Kim and Jisong Kim and Jun Won Choi
- Abstract summary: We propose a new architecture for fusing camera and LiDAR sensors for 3D object detection.
The proposed 3D-CVF achieves state-of-the-art performance in the KITTI benchmark.
- Score: 10.507404260449333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new deep architecture for fusing camera and LiDAR
sensors for 3D object detection. Because the camera and LiDAR sensor signals
have different characteristics and distributions, fusing these two modalities
is expected to improve both the accuracy and robustness of 3D object detection.
One of the challenges presented by the fusion of cameras and LiDAR is that the
spatial feature maps obtained from each modality are represented by
significantly different views in the camera and world coordinates; hence, it is
not an easy task to combine two heterogeneous feature maps without loss of
information. To address this problem, we propose a method called 3D-CVF that
combines the camera and LiDAR features using the cross-view spatial feature
fusion strategy. First, the method employs auto-calibrated projection, to
transform the 2D camera features to a smooth spatial feature map with the
highest correspondence to the LiDAR features in the bird's eye view (BEV)
domain. Then, a gated feature fusion network is applied to use the spatial
attention maps to mix the camera and LiDAR features appropriately according to
the region. Next, camera-LiDAR feature fusion is also achieved in the
subsequent proposal refinement stage. The camera feature is used from the 2D
camera-view domain via 3D RoI grid pooling and fused with the BEV feature for
proposal refinement. Our evaluations, conducted on the KITTI and nuScenes 3D
object detection datasets demonstrate that the camera-LiDAR fusion offers
significant performance gain over single modality and that the proposed 3D-CVF
achieves state-of-the-art performance in the KITTI benchmark.
Related papers
- BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection [10.321117046185321]
This letter proposes a novel bidirectional complementary Lidar-camera fusion framework, called BiCo-Fusion.
The key insight is to mutually fuse the multi-modal features to enhance the semantics of LiDAR features and the spatial awareness of the camera features.
We then introduce Unified Fusion to adaptively weight to select features from the enchanted Lidar and camera features to build a unified 3D representation.
arXiv Detail & Related papers (2024-06-27T09:56:38Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye
View Representation for 3D Object Detection [14.706717531900708]
LiDAR and camera are two essential sensors for 3D object detection in autonomous driving.
Recent methods focus on point-level fusion which paints the LiDAR point cloud with camera features in the perspective view.
We present SemanticBEVFusion to deeply fuse camera features with LiDAR features in a unified BEV representation.
arXiv Detail & Related papers (2022-12-09T05:48:58Z) - 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object
Detection [13.068266058374775]
We propose a novel camera-LiDAR fusion architecture called 3D Dual-Fusion.
The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention.
The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-11-24T11:00:50Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.