DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception
- URL: http://arxiv.org/abs/2303.08333v1
- Date: Wed, 15 Mar 2023 02:42:48 GMT
- Title: DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception
- Authors: Jiayu Zou, Zheng Zhu, Yun Ye, Xingang Wang
- Abstract summary: We propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation.
In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature.
We show that DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach.
- Score: 14.968177102647783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BEV perception is of great importance in the field of autonomous driving,
serving as the cornerstone of planning, controlling, and motion prediction. The
quality of the BEV feature highly affects the performance of BEV perception.
However, taking the noises in camera parameters and LiDAR scans into
consideration, we usually obtain BEV representation with harmful noises.
Diffusion models naturally have the ability to denoise noisy samples to the
ideal data, which motivates us to utilize the diffusion model to get a better
BEV representation. In this work, we propose an end-to-end framework, named
DiffBEV, to exploit the potential of diffusion model to generate a more
comprehensive BEV representation. To the best of our knowledge, we are the
first to apply diffusion model to BEV perception. In practice, we design three
types of conditions to guide the training of the diffusion model which denoises
the coarse samples and refines the semantic feature in a progressive way.
What's more, a cross-attention module is leveraged to fuse the context of BEV
feature and the semantic content of conditional diffusion model. DiffBEV
achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the
best-performing existing approach. Quantitative and qualitative results on
multiple benchmarks demonstrate the effectiveness of DiffBEV in BEV semantic
segmentation and 3D object detection tasks. The code will be available soon.
Related papers
- MambaBEV: An efficient 3D detection model with Mamba2 [4.782473183865045]
We propose a mamba2-based BEV 3D object detection model named MambaBEV.
We also adapt an end to end self driving paradigm to test the performance of the model.
arXiv Detail & Related papers (2024-10-16T15:37:29Z) - BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autonomous Driving [3.4113606473878386]
We conduct a comprehensive cross-dataset evaluation of state-of-the-art BEV segmentation models.
We investigate the influence of different sensors, such as cameras and LiDAR, on the models' ability to generalize.
arXiv Detail & Related papers (2024-08-29T07:49:31Z) - FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies.
We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds.
We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z) - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space [57.68134574076005]
We present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View latent space for environment modeling.
Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.
We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.
Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D
Object Detection [57.019527599167255]
Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements.
We show in our paper that directly applying quantization in BEV tasks will 1) make the training unstable, and 2) lead to intolerable performance degradation.
Our method QD-BEV enables a novel view-guided distillation (VGD) objective, which can stabilize the quantization-aware training (QAT) while enhancing the model performance.
arXiv Detail & Related papers (2023-08-21T07:06:49Z) - Flexible Amortized Variational Inference in qBOLD MRI [56.4324135502282]
Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data.
Existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV.
This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV.
arXiv Detail & Related papers (2022-03-11T10:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.