Related papers: Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric

Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric

URL: http://arxiv.org/abs/2304.09785v3
Date: Sun, 7 May 2023 16:16:13 GMT
Title: Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric
Authors: Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
Abstract summary: Post-Training Quantization (PTQ) transforms a full-precision model into low bit-width directly. PTQ suffers severe accuracy drop when applied to complex tasks such as object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters.
Score: 43.81334288840746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.

Related papers

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers [71.2294205496784]
We propose textbfAPHQ-ViT, a novel PTQ approach based on importance estimation with Average Perturbation Hessian (APH) We show that APHQ-ViT using linear quantizers outperforms existing PTQ methods by substantial margins in 3-bit and 4-bit across different vision tasks.
arXiv Detail & Related papers (2025-04-03T11:48:56Z)
Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy. Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z)
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection [35.35457515189062]
Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks. LiDAR-PTQ can achieve state-of-the-art quantization performance when applied to CenterPoint. LiDAR-PTQ is cost-effective being $30times$ faster than the quantization-aware training method.
arXiv Detail & Related papers (2024-01-29T03:35:55Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection [55.291579862817656]
Existing oriented object detection methods commonly use metric AP$_50$ to measure the performance of the model. We argue that AP$_50$ is inherently unsuitable for oriented object detection due to its large tolerance in angle deviation. We propose an Aspect Ratio Sensitive Oriented Object Detector with Transformer, termed ARS-DETR, which exhibits a competitive performance.
arXiv Detail & Related papers (2023-03-09T02:20:56Z)
PD-Quant: Post-Training Quantization based on Prediction Difference Metric [43.81334288840746]
Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. How to determine the appropriate quantization parameters is the main problem facing now. PD-Quant is a method that addresses this limitation by considering global information.
arXiv Detail & Related papers (2022-12-14T05:48:58Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization [0.0]
Powers-of-two (PoT) quantization reduces the number of bit operations of deep neural networks on resource-constrained hardware. PoT quantization triggers a severe accuracy drop because of its limited representation ability. We propose an efficient PoT quantization scheme that balances accuracy and costs in a memory-efficient way.
arXiv Detail & Related papers (2021-03-22T10:13:12Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
Optimisation of the PointPillars network for 3D object detection in point clouds [1.1470070927586016]
In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. This will allow for real-time LiDAR data processing with low energy consumption.
arXiv Detail & Related papers (2020-07-01T13:50:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.