Related papers: LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving

LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving

URL: http://arxiv.org/abs/2601.09812v1
Date: Wed, 14 Jan 2026 19:19:37 GMT
Title: LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving
Authors: Carlo Sgaravatti, Riccardo Pieroni, Matteo Corno, Sergio M. Savaresi, Luca Magri, Giacomo Boracchi,
Abstract summary: Accurately localizing 3D objects like pedestrians, cyclists, and other vehicles is essential in Autonomous Driving.<n>We propose LCF3D, a novel sensor fusion framework that combines a 2D object detector on RGB images with a 3D object detector on LiDAR point clouds.
Score: 18.92466723080506
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurately localizing 3D objects like pedestrians, cyclists, and other vehicles is essential in Autonomous Driving. To ensure high detection performance, Autonomous Vehicles complement RGB cameras with LiDAR sensors, but effectively combining these data sources for 3D object detection remains challenging. We propose LCF3D, a novel sensor fusion framework that combines a 2D object detector on RGB images with a 3D object detector on LiDAR point clouds. By leveraging multimodal fusion principles, we compensate for inaccuracies in the LiDAR object detection network. Our solution combines two key principles: (i) late fusion, to reduce LiDAR False Positives by matching LiDAR 3D detections with RGB 2D detections and filtering out unmatched LiDAR detections; and (ii) cascade fusion, to recover missed objects from LiDAR by generating new 3D frustum proposals corresponding to unmatched RGB detections. Experiments show that LCF3D is beneficial for domain generalization, as it turns out to be successful in handling different sensor configurations between training and testing domains. LCF3D achieves significant improvements over LiDAR-based methods, particularly for challenging categories like pedestrians and cyclists in the KITTI dataset, as well as motorcycles and bicycles in nuScenes. Code can be downloaded from: https://github.com/CarloSgaravatti/LCF3D.

Related papers

Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes [59.78696921486972]
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy.<n>We propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities.
arXiv Detail & Related papers (2025-07-25T14:20:16Z)
Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z)
Long-Tailed 3D Detection via Multi-Modal Fusion [58.89765900064689]
We study the problem of Long-Tailed 3D Detection (LT3D), which evaluates all annotated classes, including those in-the-tail.<n>We point out that rare-class accuracy is particularly improved via multi-modal late fusion (MMLF) of independently trained uni-modal LiDAR and RGB detectors.<n>Our MMLF significantly outperforms prior work for LT3D, particularly improving on the six rarest classes from 12.8 to 20.0 mAP!
arXiv Detail & Related papers (2023-12-18T07:14:25Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy. Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes. MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection [0.0]
We propose a novel data fusion algorithm to combine accurate point clouds with dense but less accurate point clouds obtained from stereo pairs. We train multiple 3D object detection methods and show that our fusion strategy consistently improves the performance of detectors.
arXiv Detail & Related papers (2021-11-08T19:29:59Z)
Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimationand 3D Object Detection [3.5488685789514736]
SLS-Fusion is a new approach to fuse data from 4-beam LiDAR and a stereo camera via a neural network for depth estimation. Since 4-beam LiDAR is cheaper than the well-known 64-beam LiDAR, this approach is also classified as a low-cost sensors-based method.
arXiv Detail & Related papers (2021-03-05T23:10:09Z)
End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs. We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.