Related papers: QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

URL: http://arxiv.org/abs/2307.04535v1
Date: Mon, 10 Jul 2023 13:01:08 GMT
Title: QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Authors: Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort
Abstract summary: Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. We propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints.
Score: 19.491778184055118
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.

Related papers

AdaQAT: Adaptive Bit-Width Quantization-Aware Training [0.873811641236639]
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. We present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimize bit-widths during training for more efficient inference.
arXiv Detail & Related papers (2024-04-22T09:23:56Z)
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution [53.23803932357899]
We introduce the first on-the-fly adaptive quantization framework that accelerates the processing time from hours to seconds. We achieve competitive performance with the previous adaptive quantization methods, while the processing time is accelerated by x2000.
arXiv Detail & Related papers (2024-04-04T08:37:27Z)
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions [20.241671088121144]
Recent quantization techniques have enabled heterogeneous precisions at very fine granularity. These networks require additional hardware to decode the precision settings for individual variables, align the variables, and provide fine-grained mixed-precision compute capabilities. We present an end-to-end co-design approach to efficiently execute networks with fine-grained heterogeneous precisions.
arXiv Detail & Related papers (2023-11-23T17:20:09Z)
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge [17.277918711842457]
Mixed-precision quantization offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy. This paper proposes a hybrid search methodology to navigate the search space of mixed-precision configurations for a given network. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware optimization to find mixed-precision configurations latency-optimized for a specific hardware target.
arXiv Detail & Related papers (2023-07-06T09:57:48Z)
MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization [51.85834744835766]
We propose MBQuant, a novel method for arbitrary bit-width quantization. We show that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods.
arXiv Detail & Related papers (2023-05-14T10:17:09Z)
A Practical Mixed Precision Algorithm for Post-Training Quantization [15.391257986051249]
Mixed-precision quantization is a promising solution to find a better performance-efficiency trade-off than homogeneous quantization. We present a simple post-training mixed precision algorithm that only requires a small unlabeled calibration dataset. We show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents.
arXiv Detail & Related papers (2023-02-10T17:47:54Z)
OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Bit-Mixer: Mixed-precision networks with runtime bit-width selection [72.32693989093558]
Bit-Mixer is the first method to train a meta-quantized network where during test time any layer can change its bid-width without affecting the overall network's ability for highly accurate inference. We show that our method can result in mixed precision networks that exhibit the desirable flexibility properties for on-device deployment without compromising accuracy.
arXiv Detail & Related papers (2021-03-31T17:58:47Z)
Rethinking Differentiable Search for Mixed-Precision Neural Networks [83.55785779504868]
Low-precision networks with weights and activations quantized to low bit-width are widely used to accelerate inference on edge devices. Current solutions are uniform, using identical bit-width for all filters. This fails to account for the different sensitivities of different filters and is suboptimal. Mixed-precision networks address this problem, by tuning the bit-width to individual filter requirements.
arXiv Detail & Related papers (2020-04-13T07:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.