Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe
- URL: http://arxiv.org/abs/2209.05324v4
- Date: Wed, 27 Sep 2023 16:15:13 GMT
- Title: Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe
- Authors: Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie
Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie,
Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu,
Jianping Shi, Dahua Lin and Yu Qiao
- Abstract summary: Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
- Score: 115.31507979199564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning powerful representations in bird's-eye-view (BEV) for perception
tasks is trending and drawing extensive attention both from industry and
academia. Conventional approaches for most autonomous driving algorithms
perform detection, segmentation, tracking, etc., in a front or perspective
view. As sensor configurations get more complex, integrating multi-source
information from different sensors and representing features in a unified view
come of vital importance. BEV perception inherits several advantages, as
representing surrounding scenes in BEV is intuitive and fusion-friendly; and
representing objects in BEV is most desirable for subsequent modules as in
planning and/or control. The core problems for BEV perception lie in (a) how to
reconstruct the lost 3D information via view transformation from perspective
view to BEV; (b) how to acquire ground truth annotations in BEV grid; (c) how
to formulate the pipeline to incorporate features from different sources and
views; and (d) how to adapt and generalize algorithms as sensor configurations
vary across different scenarios. In this survey, we review the most recent
works on BEV perception and provide an in-depth analysis of different
solutions. Moreover, several systematic designs of BEV approach from the
industry are depicted as well. Furthermore, we introduce a full suite of
practical guidebook to improve the performance of BEV perception tasks,
including camera, LiDAR and fusion inputs. At last, we point out the future
research directions in this area. We hope this report will shed some light on
the community and encourage more research effort on BEV perception. We keep an
active repository to collect the most recent work and provide a toolbox for bag
of tricks at https://github.com/OpenDriveLab/Birds-eye-view-Perception
Related papers
- BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment [8.098296280937518]
We present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal.
By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment.
arXiv Detail & Related papers (2024-10-28T12:40:27Z) - OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation [57.2213693781672]
Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems.
We propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance.
Our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation.
arXiv Detail & Related papers (2024-07-18T03:48:22Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation.
We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z) - SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view
3D Object Detection [46.92706423094971]
We propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features.
We also propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature.
Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-21T10:28:19Z) - Leveraging BEV Representation for 360-degree Visual Place Recognition [14.497501941931759]
This paper investigates the advantages of using Bird's Eye View representation in 360-degree visual place recognition (VPR)
We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion.
The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets.
arXiv Detail & Related papers (2023-05-23T08:29:42Z) - BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks [28.024042528077125]
Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems.
We propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights.
arXiv Detail & Related papers (2022-12-02T15:14:48Z) - BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View
Recognition via Perspective Supervision [101.36648828734646]
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.
The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
arXiv Detail & Related papers (2022-11-18T18:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.