Reducing the Side-Effects of Oscillations in Training of Quantized YOLO
Networks
- URL: http://arxiv.org/abs/2311.05109v1
- Date: Thu, 9 Nov 2023 02:53:21 GMT
- Title: Reducing the Side-Effects of Oscillations in Training of Quantized YOLO
Networks
- Authors: Kartik Gupta, Akshay Asthana
- Abstract summary: We show that it is difficult to achieve extremely low precision (4-bit and lower) for efficient YOLO models even with SOTA QAT methods due to oscillation issue.
We propose a simple QAT correction method, namely QC, that takes only a single epoch of training after standard QAT procedure to correct the error.
- Score: 5.036532914308394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantized networks use less computational and memory resources and are
suitable for deployment on edge devices. While quantization-aware training QAT
is the well-studied approach to quantize the networks at low precision, most
research focuses on over-parameterized networks for classification with limited
studies on popular and edge device friendly single-shot object detection and
semantic segmentation methods like YOLO. Moreover, majority of QAT methods rely
on Straight-through Estimator (STE) approximation which suffers from an
oscillation phenomenon resulting in sub-optimal network quantization. In this
paper, we show that it is difficult to achieve extremely low precision (4-bit
and lower) for efficient YOLO models even with SOTA QAT methods due to
oscillation issue and existing methods to overcome this problem are not
effective on these models. To mitigate the effect of oscillation, we first
propose Exponentially Moving Average (EMA) based update to the QAT model.
Further, we propose a simple QAT correction method, namely QC, that takes only
a single epoch of training after standard QAT procedure to correct the error
induced by oscillating weights and activations resulting in a more accurate
quantized model. With extensive evaluation on COCO dataset using various YOLO5
and YOLO7 variants, we show that our correction method improves quantized YOLO
networks consistently on both object detection and segmentation tasks at
low-precision (4-bit and 3-bit).
Related papers
- RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.
Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.
Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z) - Q-YOLO: Efficient Inference for Real-time Object Detection [29.51643492051404]
Real-time object detection plays a vital role in various computer vision applications.
deploying real-time object detectors on resource-constrained platforms poses challenges due to high computational and memory requirements.
This paper describes a low-bit quantization method to build a highly efficient one-stage detector, dubbed as Q-YOLO.
arXiv Detail & Related papers (2023-07-01T03:50:32Z) - SQuAT: Sharpness- and Quantization-Aware Training for BERT [43.049102196902844]
We propose sharpness- and quantization-aware training (SQuAT)
Our method can consistently outperform state-of-the-art quantized BERT models under 2, 3, and 4-bit settings by 1%.
Our experiments on empirical measurement of sharpness also suggest that our method would lead to flatter minima compared to other quantization methods.
arXiv Detail & Related papers (2022-10-13T16:52:19Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Overcoming Oscillations in Quantization-Aware Training [18.28657022169428]
When training neural networks with simulated quantization, quantized weights can, rather unexpectedly, oscillate between two grid-points.
We show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics.
We propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing.
arXiv Detail & Related papers (2022-03-21T16:07:42Z) - LG-LSQ: Learned Gradient Linear Symmetric Quantization [3.6816597150770387]
Deep neural networks with lower precision weights have advantages in terms of the cost of memory space and accelerator power.
The main challenge associated with the quantization algorithm is maintaining accuracy at low bit-widths.
We propose learned gradient linear symmetric quantization (LG-LSQ) as a method for quantizing weights and activation functions to low bit-widths.
arXiv Detail & Related papers (2022-02-18T03:38:12Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Learnable Companding Quantization for Accurate Low-bit Neural Networks [3.655021726150368]
Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed.
It is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models.
We propose learnable companding quantization (LCQ) as a novel non-uniform quantization method for 2-, 3-, and 4-bit models.
arXiv Detail & Related papers (2021-03-12T09:06:52Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.