Related papers: DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection

DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection

URL: http://arxiv.org/abs/2404.03015v2
Date: Wed, 27 Nov 2024 16:50:46 GMT
Title: DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection
Authors: Felix Fent, Andras Palffy, Holger Caesar,
Abstract summary: We propose a novel camera-radar fusion approach called Dual Perspective Fusion Transformer (DPFT)<n>Our method leverages lower-level radar data (the radar cube) instead of the processed point clouds to preserve as much information as possible.<n>DPFT has demonstrated state-of-the-art performance on the K-Radar dataset while showing remarkable robustness against adverse weather conditions.
Score: 0.7919810878571297
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The perception of autonomous vehicles has to be efficient, robust, and cost-effective. However, cameras are not robust against severe weather conditions, lidar sensors are expensive, and the performance of radar-based perception is still inferior to the others. Camera-radar fusion methods have been proposed to address this issue, but these are constrained by the typical sparsity of radar point clouds and often designed for radars without elevation information. We propose a novel camera-radar fusion approach called Dual Perspective Fusion Transformer (DPFT), designed to overcome these limitations. Our method leverages lower-level radar data (the radar cube) instead of the processed point clouds to preserve as much information as possible and employs projections in both the camera and ground planes to effectively use radars with elevation information and simplify the fusion with camera data. As a result, DPFT has demonstrated state-of-the-art performance on the K-Radar dataset while showing remarkable robustness against adverse weather conditions and maintaining a low inference time. The code is made available as open-source software under https://github.com/TUMFTM/DPFT.

Related papers

TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion. Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed. Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z)
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance. Radar suffers from noise and positional ambiguity. We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection [6.163747364795787]
We present TransRAD, a novel 3D radar object detection model. We propose Location-Aware NMS to mitigate the common issue of duplicate bounding boxes in deep radar object detection. Results demonstrate that TransRAD outperforms state-of-the-art methods in both 2D and 3D radar detection tasks.
arXiv Detail & Related papers (2025-01-29T20:21:41Z)
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection [10.91039672865197]
Millimeter-wave radar plays a vital role in 3D object detection for autonomous driving. Radar point clouds suffer from pronounced sparsity and unavoidable angle estimation errors. Direct fusion of radar and camera data can lead to negative or even opposite effects.
arXiv Detail & Related papers (2024-12-16T07:06:17Z)
A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data [7.2508100569856975]
We use the raw range-Doppler spectrum of radar data to process camera images. We extract the corresponding features with our camera encoder-decoder architecture. The resultant feature maps are fused with Range-Azimuth features, recovered from the RD spectrum input to perform object detection.
arXiv Detail & Related papers (2024-11-20T13:26:13Z)
Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers. Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements. We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z)
Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline. Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z)
RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion [11.646949644683755]
We present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We show significant performance gains of up to 28% increase in the nuScenes detection score.
arXiv Detail & Related papers (2023-05-25T09:26:04Z)
RadarFormer: Lightweight and Accurate Real-Time Radar Object Detection Model [13.214257841152033]
Radar-centric data sets do not get a lot of attention in the development of deep learning techniques for radar perception. We propose a transformers-based model, named RadarFormer, that utilizes state-of-the-art developments in vision deep learning. Our model also introduces a channel-chirp-time merging module that reduces the size and complexity of our models by more than 10 times without compromising accuracy.
arXiv Detail & Related papers (2023-04-17T17:07:35Z)
CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection [12.557361522985898]
We propose a camera-radar matching network CramNet to fuse the sensor readings from camera and radar in a joint 3D space. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle.
arXiv Detail & Related papers (2022-10-17T17:18:47Z)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. TransFusion achieves state-of-the-art performance on large-scale datasets. We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z)
LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation. The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy. The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z)
LiRaNet: End-to-End Trajectory Prediction using Spatio-Temporal Radar Fusion [52.59664614744447]
We present LiRaNet, a novel end-to-end trajectory prediction method which utilizes radar sensor information along with widely used lidar and high definition (HD) maps. automotive radar provides rich, complementary information, allowing for longer range vehicle detection as well as instantaneous velocity measurements.
arXiv Detail & Related papers (2020-10-02T00:13:00Z)
RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars. We propose a new solution that exploits both LiDAR and Radar sensors for perception. Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.