DenseBEV: Transforming BEV Grid Cells into 3D Objects
- URL: http://arxiv.org/abs/2512.16818v1
- Date: Thu, 18 Dec 2025 17:59:22 GMT
- Title: DenseBEV: Transforming BEV Grid Cells into 3D Objects
- Authors: Marius Dähling, Sebastian Krebs, J. Marius Zöllner,
- Abstract summary: Bird's-Eye-View (BEV)-based transformers are increasingly utilized for multi-camera 3D object detection.<n>Recent advancements complement or replace these random queries with detections from auxiliary networks.<n>We propose a more intuitive and efficient approach by using BEV feature cells directly as anchors.
- Score: 10.619058888618051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In current research, Bird's-Eye-View (BEV)-based transformers are increasingly utilized for multi-camera 3D object detection. Traditional models often employ random queries as anchors, optimizing them successively. Recent advancements complement or replace these random queries with detections from auxiliary networks. We propose a more intuitive and efficient approach by using BEV feature cells directly as anchors. This end-to-end approach leverages the dense grid of BEV queries, considering each cell as a potential object for the final detection task. As a result, we introduce a novel two-stage anchor generation method specifically designed for multi-camera 3D object detection. To address the scaling issues of attention with a large number of queries, we apply BEV-based Non-Maximum Suppression, allowing gradients to flow only through non-suppressed objects. This ensures efficient training without the need for post-processing. By using BEV features from encoders such as BEVFormer directly as object queries, temporal BEV information is inherently embedded. Building on the temporal BEV information already embedded in our object queries, we introduce a hybrid temporal modeling approach by integrating prior detections to further enhance detection performance. Evaluating our method on the nuScenes dataset shows consistent and significant improvements in NDS and mAP over the baseline, even with sparser BEV grids and therefore fewer initial anchors. It is particularly effective for small objects, enhancing pedestrian detection with a 3.8% mAP increase on nuScenes and an 8% increase in LET-mAP on Waymo. Applying our method, named DenseBEV, to the challenging Waymo Open dataset yields state-of-the-art performance, achieving a LET-mAP of 60.7%, surpassing the previous best by 5.4%. Code is available at https://github.com/mdaehl/DenseBEV.
Related papers
- RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z) - Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments [67.83787474506073]
We tackle the limitations of current LiDAR-based 3D object detection systems.
We introduce a universal textscFind n' Propagate approach for 3D OV tasks.
We achieve up to a 3.97-fold increase in Average Precision (AP) for novel object classes.
arXiv Detail & Related papers (2024-03-20T12:51:30Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection [47.7933708173225]
Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection.
This paper introduces a "modernized" dense BEV framework dubbed BEVNeXt.
On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks.
arXiv Detail & Related papers (2023-12-04T07:35:02Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.<n>Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.<n>This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - QE-BEV: Query Evolution for Bird's Eye View Object Detection in Varied Contexts [2.949710700293865]
3D object detection plays a pivotal role in autonomous driving and robotics, demanding precise interpretation of Bird's Eye View (BEV) images.
We introduce a framework utilizing dynamic query evolution strategy, harnesses K-means and Top-K attention mechanisms.
Our evaluation showcases a marked improvement in detection accuracy, setting a new benchmark in the domain of query-based BEV object detection.
arXiv Detail & Related papers (2023-10-07T21:55:29Z) - UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities [7.470926069132259]
We propose an end-to-end multi-modal 3D object detection framework designed for robustness against missing modalities.
UniBEV can operate on LiDAR plus camera input, but also on LiDAR-only or camera-only input without retraining.
We compare UniBEV to state-of-the-art BEVFusion and MetaBEV on nuScenes over all sensor input combinations.
arXiv Detail & Related papers (2023-09-25T20:22:47Z) - SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera
Videos [20.51396212498941]
SparseBEV is a fully sparse 3D object detector that outperforms the dense counterparts.
On the test split of nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS.
arXiv Detail & Related papers (2023-08-18T02:11:01Z) - OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection [29.530177591608297]
Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost.
Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm.
We propose an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively.
arXiv Detail & Related papers (2023-06-02T17:59:48Z) - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation [104.12419434114365]
In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
arXiv Detail & Related papers (2023-04-19T16:37:17Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - RAANet: Range-Aware Attention Network for LiDAR-based 3D Object
Detection with Auxiliary Density Level Estimation [11.180128679075716]
Range-Aware Attention Network (RAANet) is developed for 3D object detection from LiDAR data for autonomous driving.
RAANet extracts more powerful BEV features and generates superior 3D object detections.
Experiments on nuScenes dataset demonstrate that our proposed approach outperforms the state-of-the-art methods for LiDAR-based 3D object detection.
arXiv Detail & Related papers (2021-11-18T04:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.