AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
- URL: http://arxiv.org/abs/2504.00559v1
- Date: Tue, 01 Apr 2025 09:10:47 GMT
- Title: AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
- Authors: Loveneet Saini, Mirko Meuter, Hasan Tercan, Tobias Meisen,
- Abstract summary: Bird's-eye view (BEV) object detection has become important for advanced automotive 3D radar-based perception systems.<n>In this paper, we introduce AttenRUtive, a novel attention-based recurrent approach tailored for radar constraints.
- Score: 5.5967570276373655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bird's-eye view (BEV) object detection has become important for advanced automotive 3D radar-based perception systems. However, the inherently sparse and non-deterministic nature of radar data limits the effectiveness of traditional single-frame BEV paradigms. In this paper, we addresses this limitation by introducing AttentiveGRU, a novel attention-based recurrent approach tailored for radar constraints, which extracts individualized spatio-temporal context for objects by dynamically identifying and fusing temporally correlated structures across present and memory states. By leveraging the consistency of object's latent representation over time, our approach exploits temporal relations to enrich feature representations for both stationary and moving objects, thereby enhancing detection performance and eliminating the need for externally providing or estimating any information about ego vehicle motion. Our experimental results on the public nuScenes dataset show a significant increase in mAP for the car category by 21% over the best radar-only submission. Further evaluations on an additional dataset demonstrate notable improvements in object detection capabilities, underscoring the applicability and effectiveness of our method.
Related papers
- BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation [3.613463012025065]
We introduce BEVMOSNet, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in bird's-eye-view (BEV)<n>We show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg.
arXiv Detail & Related papers (2025-03-05T09:03:46Z) - Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring [1.0128808054306184]
The study introduces a S-Temporal Vehicle Detection dataset (STVD) containing 600 sequential frame-based images by UAVs.
A YOLO object detection algorithm is enhanced to incorporate temporal dynamics, resulting in improved performance over single frame models.
arXiv Detail & Related papers (2024-10-17T14:49:37Z) - Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences [25.74000325019015]
We introduce a novel LiDAR 3D object detection framework, namely LiSTM, to facilitate spatial-temporal feature learning with cross-frame motion forecasting information.
We have conducted experiments on the aggregation and nuScenes datasets to demonstrate that the proposed framework achieves superior 3D detection performance.
arXiv Detail & Related papers (2024-09-06T16:29:04Z) - Leveraging Self-Supervised Instance Contrastive Learning for Radar
Object Detection [7.728838099011661]
This paper presents RiCL, an instance contrastive learning framework to pre-train radar object detectors.
We aim to pre-train an object detector's backbone, head and neck to learn with fewer data.
arXiv Detail & Related papers (2024-02-13T12:53:33Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - RV-FuseNet: Range View Based Fusion of Time-Series LiDAR Data for Joint
3D Object Detection and Motion Forecasting [13.544498422625448]
We present RV-FuseNet, a novel end-to-end approach for joint detection and trajectory estimation.
Instead of the widely used bird's eye view (BEV) representation, we utilize the native range view (RV) representation of LiDAR data.
We show that our approach significantly improves motion forecasting performance over the existing state-of-the-art.
arXiv Detail & Related papers (2020-05-21T19:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.