BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
- URL: http://arxiv.org/abs/2312.01696v2
- Date: Sun, 24 Mar 2024 14:50:42 GMT
- Title: BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
- Authors: Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu,
- Abstract summary: Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection.
This paper introduces a "modernized" dense BEV framework dubbed BEVNeXt.
On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks.
- Score: 47.7933708173225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This paper aims to address the drawbacks of the existing dense BEV-based 3D object detectors by introducing our proposed enhanced components, including a CRF-modulated depth estimation module enforcing object-level consistencies, a long-term temporal aggregation module with extended receptive fields, and a two-stage object decoder combining perspective techniques with CRF-modulated depth embedding. These enhancements lead to a "modernized" dense BEV framework dubbed BEVNeXt. On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks under various settings, achieving a state-of-the-art result of 64.2 NDS on the nuScenes test set. Code will be available at \url{https://github.com/woxihuanjiangguo/BEVNeXt}.
Related papers
- GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection [36.245654685143016]
Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection.
Existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state.
arXiv Detail & Related papers (2024-09-03T11:57:36Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity [34.025530326420146]
We develop Complementary-BEV, a novel end-to-end monocular 3D object detection framework.
We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D.
For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode.
arXiv Detail & Related papers (2023-10-04T13:38:53Z) - SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera
Videos [20.51396212498941]
SparseBEV is a fully sparse 3D object detector that outperforms the dense counterparts.
On the test split of nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS.
arXiv Detail & Related papers (2023-08-18T02:11:01Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation [38.38852904444365]
This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D.
Our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2023-04-04T04:05:56Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.