PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction
in Bird's-Eye View
- URL: http://arxiv.org/abs/2306.10761v1
- Date: Mon, 19 Jun 2023 08:11:05 GMT
- Title: PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction
in Bird's-Eye View
- Authors: Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius
Cordts and Juergen Gall
- Abstract summary: Bird's-eye view (BEV) representations are commonplace in perception for autonomous driving.
Existing approaches for BEV instance prediction rely on a multi-task auto-regressive coupled with post-processing to predict future instances.
We propose an efficient novel end-to-end framework named POWERBEV, which differs in several design choices aimed at reducing the inherent redundancy in previous methods.
- Score: 14.113805629254191
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurately perceiving instances and predicting their future motion are key
tasks for autonomous vehicles, enabling them to navigate safely in complex
urban traffic. While bird's-eye view (BEV) representations are commonplace in
perception for autonomous driving, their potential in a motion prediction
setting is less explored. Existing approaches for BEV instance prediction from
surround cameras rely on a multi-task auto-regressive setup coupled with
complex post-processing to predict future instances in a spatio-temporally
consistent manner. In this paper, we depart from this paradigm and propose an
efficient novel end-to-end framework named POWERBEV, which differs in several
design choices aimed at reducing the inherent redundancy in previous methods.
First, rather than predicting the future in an auto-regressive fashion,
POWERBEV uses a parallel, multi-scale module built from lightweight 2D
convolutional networks. Second, we show that segmentation and centripetal
backward flow are sufficient for prediction, simplifying previous multi-task
objectives by eliminating redundant output modalities. Building on this output
representation, we propose a simple, flow warping-based post-processing
approach which produces more stable instance associations across time. Through
this lightweight yet powerful design, POWERBEV outperforms state-of-the-art
baselines on the NuScenes Dataset and poses an alternative paradigm for BEV
instance prediction. We made our code publicly available at:
https://github.com/EdwardLeeLPZ/PowerBEV.
Related papers
- CASPFormer: Trajectory Prediction from BEV Images with Deformable
Attention [4.9349065371630045]
We propose Context Aware Scene Prediction Transformer (CASPFormer), which can perform multi-modal motion prediction from spatialized Bird-Eye-View (BEV) images.
Our system can be integrated with any upstream perception module that is capable of generating BEV images.
We evaluate our model on the nuScenes dataset and show that it reaches state-of-the-art across multiple metrics.
arXiv Detail & Related papers (2024-09-26T12:37:22Z) - Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving [52.808273563372126]
This paper proposes a novel hierarchical BEV perception paradigm, aiming to provide a library of fundamental perception modules and user-friendly graphical interface.
We conduct the Pretrain-Finetune strategy to effectively utilize large scale public datasets and streamline development processes.
We also present a Multi-Module Learning (MML) approach, enhancing performance through synergistic and iterative training of multiple models.
arXiv Detail & Related papers (2024-07-17T11:17:20Z) - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space [57.68134574076005]
We present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View latent space for environment modeling.
Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving [8.370230253558159]
The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving.
We propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR)
In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR)
arXiv Detail & Related papers (2024-04-19T13:08:43Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation.
We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.