A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection
- URL: http://arxiv.org/abs/2504.18419v1
- Date: Fri, 25 Apr 2025 15:28:53 GMT
- Title: A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection
- Authors: Carlo Sgaravatti, Roberto Basla, Riccardo Pieroni, Matteo Corno, Sergio M. Savaresi, Luca Magri, Giacomo Boracchi,
- Abstract summary: We present a new way to detect 3D objects from multimodal inputs, leveraging both LiDAR and RGB cameras in a hybrid late-cascade scheme.<n>We exploit late fusion principles to reduce LiDAR False Positives, matching LiDAR detections with RGB ones by projecting the LiDAR bounding boxes on the image.<n>We evaluate our results on the KITTI object detection benchmark, showing significant performance improvements.
- Score: 6.399439052541506
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a new way to detect 3D objects from multimodal inputs, leveraging both LiDAR and RGB cameras in a hybrid late-cascade scheme, that combines an RGB detection network and a 3D LiDAR detector. We exploit late fusion principles to reduce LiDAR False Positives, matching LiDAR detections with RGB ones by projecting the LiDAR bounding boxes on the image. We rely on cascade fusion principles to recover LiDAR False Negatives leveraging epipolar constraints and frustums generated by RGB detections of separate views. Our solution can be plugged on top of any underlying single-modal detectors, enabling a flexible training process that can take advantage of pre-trained LiDAR and RGB detectors, or train the two branches separately. We evaluate our results on the KITTI object detection benchmark, showing significant performance improvements, especially for the detection of Pedestrians and Cyclists.
Related papers
- Adaptive LiDAR Scanning: Harnessing Temporal Cues for Efficient 3D Object Detection via Multi-Modal Fusion [11.351728925952193]
Conventional LiDAR sensors perform dense, stateless scans, ignoring the strong temporal continuity in real-world scenes.<n>We propose a predictive, history-aware adaptive scanning framework that anticipates informative regions of interest based on past observations.<n>Our method significantly reduces unnecessary data acquisition by concentrating dense LiDAR scanning only within these ROIs and sparsely sampling elsewhere.
arXiv Detail & Related papers (2025-08-03T03:20:36Z) - Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes [59.78696921486972]
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy.<n>We propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities.
arXiv Detail & Related papers (2025-07-25T14:20:16Z) - Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - VaLID: Verification as Late Integration of Detections for LiDAR-Camera Fusion [2.503388496100123]
Vehicle object detection benefits from both LiDAR and camera data.
We propose a model-adaptive late-fusion method, VaLID, which validates whether each predicted bounding box is acceptable.
Our approach is model-adaptive and demonstrates state-of-the-art competitive performance even when using generic camera detectors.
arXiv Detail & Related papers (2024-09-23T20:27:10Z) - Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images.
In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data.
We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z) - LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection [7.505655376776177]
We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection.
We design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps.
We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion achieves notable improvement over existing methods.
arXiv Detail & Related papers (2024-02-18T23:29:28Z) - Long-Tailed 3D Detection via Multi-Modal Fusion [47.03801888003686]
We study the problem of Long-Tailed 3D Detection (LT3D), which evaluates all annotated classes, including those in-the-tail.
We point out that rare-class accuracy is particularly improved via multi-modal late fusion (MMLF) of independently trained uni-modal LiDAR and RGB detectors.
Our proposed MMLF approach significantly improves LT3D performance over prior work, particularly improving rare class performance from 12.8 to 20.0 mAP!
arXiv Detail & Related papers (2023-12-18T07:14:25Z) - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.