Modular Quantization-Aware Training: Increasing Accuracy by Decreasing
Precision in 6D Object Pose Estimation
- URL: http://arxiv.org/abs/2303.06753v2
- Date: Wed, 29 Nov 2023 01:17:18 GMT
- Title: Modular Quantization-Aware Training: Increasing Accuracy by Decreasing
Precision in 6D Object Pose Estimation
- Authors: Saqib Javed, Chengkun Li, Andrew Price, Yinlin Hu, Mathieu Salzmann
- Abstract summary: Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy.
MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
- Score: 56.80039657816035
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Edge applications, such as collaborative robotics and spacecraft rendezvous,
demand efficient 6D object pose estimation on resource-constrained embedded
platforms. Existing 6D pose estimation networks are often too large for such
deployments, necessitating compression while maintaining reliable performance.
To address this challenge, we introduce Modular Quantization-Aware Training
(MQAT), an adaptive and mixed-precision quantization-aware training strategy
that exploits the modular structure of modern 6D pose estimation architectures.
MQAT guides a systematic gradated modular quantization sequence and determines
module-specific bit precisions, leading to quantized models that outperform
those produced by state-of-the-art uniform and mixed-precision quantization
techniques. Our experiments showcase the generality of MQAT across datasets,
architectures, and quantization algorithms. Remarkably, MQAT-trained quantized
models achieve a significant accuracy boost (>7%) over the baseline
full-precision network while reducing model size by a factor of 4x or more.
Related papers
- Q-S5: Towards Quantized State Space Models [41.94295877935867]
State Space Models (SSMs) have emerged as a potent alternative to transformers.
This paper investigates the effect of quantization on the S5 model to understand its impact on model performance.
arXiv Detail & Related papers (2024-06-13T09:53:24Z) - Adaptive quantization with mixed-precision based on low-cost proxy [8.527626602939105]
This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ)
The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity.
Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models.
arXiv Detail & Related papers (2024-02-27T17:36:01Z) - A Study of Quantisation-aware Training on Time Series Transformer Models
for Resource-constrained FPGAs [19.835810073852244]
This study explores the quantisation-aware training (QAT) on time series Transformer models.
We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase.
arXiv Detail & Related papers (2023-10-04T08:25:03Z) - On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices.
For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator.
For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z) - LLM-QAT: Data-Free Quantization Aware Training for Large Language Models [38.76165207636793]
We propose a data-free distillation method that leverages generations produced by the pre-trained model.
In addition to quantizing weights and activations, we also quantize the KV cache, which is critical for increasing throughput.
We experiment with LLaMA models of sizes 7B, 13B, and 30B, at quantization levels down to 4-bits.
arXiv Detail & Related papers (2023-05-29T05:22:11Z) - CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level
Continuous Sparsification [51.81850995661478]
Mixed-precision quantization has been widely applied on deep neural networks (DNNs)
Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence.
We propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability.
arXiv Detail & Related papers (2022-12-06T05:44:21Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - Learnable Companding Quantization for Accurate Low-bit Neural Networks [3.655021726150368]
Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed.
It is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models.
We propose learnable companding quantization (LCQ) as a novel non-uniform quantization method for 2-, 3-, and 4-bit models.
arXiv Detail & Related papers (2021-03-12T09:06:52Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.