Multi-Modal 3D Object Detection by Box Matching
- URL: http://arxiv.org/abs/2305.07713v1
- Date: Fri, 12 May 2023 18:08:51 GMT
- Title: Multi-Modal 3D Object Detection by Box Matching
- Authors: Zhe Liu, Xiaoqing Ye, Zhikang Zou, Xinwei He, Xiao Tan, Errui Ding,
Jingdong Wang, Xiang Bai
- Abstract summary: We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
- Score: 109.43430123791684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal 3D object detection has received growing attention as the
information from different sensors like LiDAR and cameras are complementary.
Most fusion methods for 3D detection rely on an accurate alignment and
calibration between 3D point clouds and RGB images. However, such an assumption
is not reliable in a real-world self-driving system, as the alignment between
different modalities is easily affected by asynchronous sensors and disturbed
sensor placement. We propose a novel {F}usion network by {B}ox {M}atching
(FBMNet) for multi-modal 3D detection, which provides an alternative way for
cross-modal feature alignment by learning the correspondence at the bounding
box level to free up the dependency of calibration during inference. With the
learned assignments between 3D and 2D object proposals, the fusion for
detection can be effectively performed by combing their ROI features. Extensive
experiments on the nuScenes dataset demonstrate that our method is much more
stable in dealing with challenging cases such as asynchronous sensors,
misaligned sensor placement, and degenerated camera images than existing fusion
methods. We hope that our FBMNet could provide an available solution to dealing
with these challenging cases for safety in real autonomous driving scenarios.
Codes will be publicly available at https://github.com/happinesslz/FBMNet.
Related papers
- Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object
Detection [20.44294678711783]
We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds.
First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features.
Second, we raise a direct set prediction problem that allows designing an effective set-based detector.
arXiv Detail & Related papers (2022-11-17T13:31:23Z) - MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous
Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy.
Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes.
MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - FUTR3D: A Unified Sensor Fusion Framework for 3D Detection [18.70932813595532]
We propose the first unified end-to-end sensor fusion framework for 3D detection, namedR3D, which can be used in almost any sensor configuration.
R3D employs a query-based Modality-Agnostic Feature Sampler (MAFS), together with a transformer decoder with a set-to-set loss for 3D detection.
On NuScenes dataset,R3D achieves better performance over specifically designed methods across different sensor combinations.
arXiv Detail & Related papers (2022-03-20T20:41:55Z) - DetMatch: Two Teachers are Better Than One for Joint 2D and 3D
Semi-Supervised Object Detection [29.722784254501768]
DetMatch is a flexible framework for joint semi-supervised learning on 2D and 3D modalities.
By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels.
We leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes.
arXiv Detail & Related papers (2022-03-17T17:58:00Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera
3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach.
associations are exploited to enhance the 3D box regression of the object.
The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.