Mask-RadarNet: Enhancing Transformer With Spatial-Temporal Semantic Context for Radar Object Detection in Autonomous Driving
- URL: http://arxiv.org/abs/2412.15595v1
- Date: Fri, 20 Dec 2024 06:39:40 GMT
- Title: Mask-RadarNet: Enhancing Transformer With Spatial-Temporal Semantic Context for Radar Object Detection in Autonomous Driving
- Authors: Yuzhi Wu, Jun Liu, Guangfeng Jiang, Weijian Liu, Danilo Orlando,
- Abstract summary: We propose a model called Mask-RadarNet to fully utilize the hierarchical semantic features from the input radar data.
Mask-RadarNet exploits the combination of interleaved convolution and attention operations to replace the traditional architecture in transformer-based models.
With relatively lower computational complexity and fewer parameters, the proposed Mask-RadarNet achieves higher recognition accuracy for object detection in autonomous driving.
- Score: 11.221694136475554
- License:
- Abstract: As a cost-effective and robust technology, automotive radar has seen steady improvement during the last years, making it an appealing complement to commonly used sensors like camera and LiDAR in autonomous driving. Radio frequency data with rich semantic information are attracting more and more attention. Most current radar-based models take radio frequency image sequences as the input. However, these models heavily rely on convolutional neural networks and leave out the spatial-temporal semantic context during the encoding stage. To solve these problems, we propose a model called Mask-RadarNet to fully utilize the hierarchical semantic features from the input radar data. Mask-RadarNet exploits the combination of interleaved convolution and attention operations to replace the traditional architecture in transformer-based models. In addition, patch shift is introduced to the Mask-RadarNet for efficient spatial-temporal feature learning. By shifting part of patches with a specific mosaic pattern in the temporal dimension, Mask-RadarNet achieves competitive performance while reducing the computational burden of the spatial-temporal modeling. In order to capture the spatial-temporal semantic contextual information, we design the class masking attention module (CMAM) in our encoder. Moreover, a lightweight auxiliary decoder is added to our model to aggregate prior maps generated from the CMAM. Experiments on the CRUW dataset demonstrate the superiority of the proposed method to some state-of-the-art radar-based object detection algorithms. With relatively lower computational complexity and fewer parameters, the proposed Mask-RadarNet achieves higher recognition accuracy for object detection in autonomous driving.
Related papers
- SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data [5.344444942640663]
Radar raw data often contains excessive noise, whereas radar point clouds retain only limited information.
We introduce an adaptive subsampling method together with a tailored network architecture that exploits the sparsity patterns.
Experiments on the RADIal dataset show that our SparseRadNet exceeds state-of-the-art (SOTA) performance in object detection and achieves close to SOTA accuracy in freespace segmentation.
arXiv Detail & Related papers (2024-06-15T11:26:10Z) - Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal
Prediction [55.30913411696375]
We propose an Asymmetric Receptive Field Autoencoder (ARFA) model, which introduces corresponding sizes of receptive field modules.
In the encoder, we present large kernel module for globaltemporal feature extraction. In the decoder, we develop a small kernel module for localtemporal reconstruction.
We construct the RainBench, a large-scale radar echo dataset for precipitation prediction, to address the scarcity of meteorological data in the domain.
arXiv Detail & Related papers (2023-09-01T07:55:53Z) - Semantic Segmentation of Radar Detections using Convolutions on Point
Clouds [59.45414406974091]
We introduce a deep-learning based method to convolve radar detections into point clouds.
We adapt this algorithm to radar-specific properties through distance-dependent clustering and pre-processing of input point clouds.
Our network outperforms state-of-the-art approaches that are based on PointNet++ on the task of semantic segmentation of radar point clouds.
arXiv Detail & Related papers (2023-05-22T07:09:35Z) - Deep Radar Inverse Sensor Models for Dynamic Occupancy Grid Maps [0.0]
We propose a deep learning-based Inverse Sensor Model (ISM) to learn the mapping from sparse radar detections to polar measurement grids.
Our approach is the first one to learn a single-frame measurement grid in the polar scheme from radars with a limited Field Of View.
This enables us to flexibly use one or more radar sensors without network retraining and without requirements on 360deg sensor coverage.
arXiv Detail & Related papers (2023-05-21T09:09:23Z) - A recurrent CNN for online object detection on raw radar frames [7.074916574419171]
This work presents a new recurrent CNN architecture for online radar object detection.
We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn dependencies between successive frames.
Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects.
arXiv Detail & Related papers (2022-12-21T16:36:36Z) - Complex-valued Convolutional Neural Networks for Enhanced Radar Signal
Denoising and Interference Mitigation [73.0103413636673]
We propose the use of Complex-Valued Convolutional Neural Networks (CVCNNs) to address the issue of mutual interference between radar sensors.
CVCNNs increase data efficiency, speeds up network training and substantially improves the conservation of phase information during interference removal.
arXiv Detail & Related papers (2021-04-29T10:06:29Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Quantized Neural Networks for Radar Interference Mitigation [14.540226579203207]
CNN-based approaches for denoising and interference mitigation yield promising results for radar processing.
We investigate quantization techniques for CNN-based denoising and interference mitigation of radar signals.
arXiv Detail & Related papers (2020-11-25T13:18:06Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.