FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention
- URL: http://arxiv.org/abs/2405.11682v1
- Date: Sun, 19 May 2024 21:52:50 GMT
- Title: FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention
- Authors: Ziang Guo, Zakhar Yagudin, Selamawit Asfaw, Artem Lykov, Dzmitry Tsetserukou,
- Abstract summary: Camera, LiDAR and radar are common perception sensors for autonomous driving tasks.
To exploit their abilities wisely remains a challenge because each of these sensors has its own characteristics.
We propose FADet, a multi-sensor 3D detection network, which specifically studies the characteristics of different sensors.
- Score: 2.332328100695052
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Camera, LiDAR and radar are common perception sensors for autonomous driving tasks. Robust prediction of 3D object detection is optimally based on the fusion of these sensors. To exploit their abilities wisely remains a challenge because each of these sensors has its own characteristics. In this paper, we propose FADet, a multi-sensor 3D detection network, which specifically studies the characteristics of different sensors based on our local featured attention modules. For camera images, we propose dual-attention-based sub-module. For LiDAR point clouds, triple-attention-based sub-module is utilized while mixed-attention-based sub-module is applied for features of radar points. With local featured attention sub-modules, our FADet has effective detection results in long-tail and complex scenes from camera, LiDAR and radar input. On NuScenes validation dataset, FADet achieves state-of-the-art performance on LiDAR-camera object detection tasks with 71.8% NDS and 69.0% mAP, at the same time, on radar-camera object detection tasks with 51.7% NDS and 40.3% mAP. Code will be released at https://github.com/ZionGo6/FADet.
Related papers
- RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.
Radar suffers from noise and positional ambiguity.
We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z) - RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network [34.45694077040797]
We present a radar-camera fusion 3D object detection framework called BEEVDet.
RadarBEVNet encodes sparse radar points into a dense bird's-eye-view feature.
Our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks.
arXiv Detail & Related papers (2024-09-08T05:14:27Z) - Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images.
In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data.
We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z) - CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking [40.630532348405595]
Camera-RADAR 3D Detection and Tracking (CR3DT) is a camera-RADAR fusion model for 3D object detection, and Multi-Object Tracking (MOT)
Building upon the foundations of the State-of-the-Art (SotA) camera-only BEVDet architecture, CR3DT demonstrates substantial improvements in both detection and tracking capabilities.
arXiv Detail & Related papers (2024-03-22T16:06:05Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for
Robust 3D Object Detection [12.557361522985898]
We propose a camera-radar matching network CramNet to fuse the sensor readings from camera and radar in a joint 3D space.
Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle.
arXiv Detail & Related papers (2022-10-17T17:18:47Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - Rethinking of Radar's Role: A Camera-Radar Dataset and Systematic
Annotator via Coordinate Alignment [38.24705460170415]
We propose a new dataset, named CRUW, with a systematic annotator and performance evaluation system.
CRUW aims to classify and localize the objects in 3D purely from radar's radio frequency (RF) images.
To the best of our knowledge, CRUW is the first public large-scale dataset with a systematic annotation and evaluation system.
arXiv Detail & Related papers (2021-05-11T17:13:45Z) - EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [68.8204255655161]
Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time.
Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal.
We propose EagerMOT, a simple tracking formulation that integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics.
arXiv Detail & Related papers (2021-04-29T22:30:29Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.