QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D
Object Detection
- URL: http://arxiv.org/abs/2308.10515v1
- Date: Mon, 21 Aug 2023 07:06:49 GMT
- Title: QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D
Object Detection
- Authors: Yifan Zhang, Zhen Dong, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yuan
Du, Kurt Keutzer, Li Du, Shanghang Zhang
- Abstract summary: Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements.
We show in our paper that directly applying quantization in BEV tasks will 1) make the training unstable, and 2) lead to intolerable performance degradation.
Our method QD-BEV enables a novel view-guided distillation (VGD) objective, which can stabilize the quantization-aware training (QAT) while enhancing the model performance.
- Score: 57.019527599167255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved
significant improvements. However, the huge memory consumption of
state-of-the-art models makes it hard to deploy them on vehicles, and the
non-trivial latency will affect the real-time perception of streaming
applications. Despite the wide application of quantization to lighten models,
we show in our paper that directly applying quantization in BEV tasks will 1)
make the training unstable, and 2) lead to intolerable performance degradation.
To solve these issues, our method QD-BEV enables a novel view-guided
distillation (VGD) objective, which can stabilize the quantization-aware
training (QAT) while enhancing the model performance by leveraging both image
features and BEV features. Our experiments show that QD-BEV achieves similar or
even better accuracy than previous methods with significant efficiency gains.
On the nuScenes datasets, the 4-bit weight and 6-bit activation quantized
QD-BEV-Tiny model achieves 37.2% NDS with only 15.8 MB model size,
outperforming BevFormer-Tiny by 1.8% with an 8x model compression. On the Small
and Base variants, QD-BEV models also perform superbly and achieve 47.9% NDS
(28.2 MB) and 50.9% NDS (32.9 MB), respectively.
Related papers
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models [58.5019443418822]
Diffusion models have been proven highly effective at generating high-quality images.
As these models grow larger, they require significantly more memory and suffer from higher latency.
In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits.
arXiv Detail & Related papers (2024-11-07T18:59:58Z) - MambaBEV: An efficient 3D detection model with Mamba2 [4.782473183865045]
We propose a mamba2-based BEV 3D object detection model named MambaBEV.
We also adapt an end to end self driving paradigm to test the performance of the model.
arXiv Detail & Related papers (2024-10-16T15:37:29Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation [104.12419434114365]
In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
arXiv Detail & Related papers (2023-04-19T16:37:17Z) - DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception [14.968177102647783]
We propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation.
In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature.
We show that DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach.
arXiv Detail & Related papers (2023-03-15T02:42:48Z) - Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline [76.48192454417138]
Bird's-Eye View (BEV) representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception.
This paper proposes a framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips.
arXiv Detail & Related papers (2023-01-29T18:43:31Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - BEVDet: High-performance Multi-camera 3D Object Detection in
Bird-Eye-View [15.560366079077449]
We contribute the BEVDet paradigm for pushing the performance boundary in 2D object detection task.
BeVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where route planning can be handily performed.
The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance.
arXiv Detail & Related papers (2021-12-22T10:48:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.