Related papers: BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

URL: http://arxiv.org/abs/2002.03090v2
Date: Tue, 11 Aug 2020 20:30:52 GMT
Title: BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization
Authors: Milo\v{s} Nikoli\'c, Ghouthi Boukli Hacene, Ciaran Bannon, Alberto Delmas Lascorz, Matthieu Courbariaux, Yoshua Bengio, Vincent Gripon and Andreas Moshovos
Abstract summary: We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13, 3.76 and 4.36 bits.
Score: 57.14179747713731
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Namely, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13, 3.76 and 4.36 bits on AlexNet, ResNet18 and MobileNet V2 respectively, remaining within 2.0%, 0.5% and 0.5% of the base TOP-1 accuracy.

Related papers

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization [58.84018707089315]
We present a unified framework for rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. We show that ternary, 2-bit, and 3-bit quantization maintains comparable performance in the size-accuracy trade-off. Considering hardware constraints, 2-bit quantization offers promising potential for memory reduction and speedup.
arXiv Detail & Related papers (2025-02-04T18:59:26Z)
Memory Efficient Optimizers with 4-bit States [22.605392665667136]
We push states bitwidth down to 4-bit through a detailed empirical analysis of first and second moments. We use a smaller block size and propose to utilize both row-wise and column-wise information for better quantization. Our 4-bits are evaluated on a wide variety of benchmarks including natural language understanding, machine translation, image classification, and instruction tuning.
arXiv Detail & Related papers (2023-09-04T10:27:17Z)
Model-based Optimization of Superconducting Qubit Readout [59.992881941624965]
We demonstrate model-based readout optimization for superconducting qubits. We observe 1.5% error per qubit with a 500ns end-to-end duration and minimal excess reset error from residual resonator photons. This technique can scale to hundreds of qubits and be used to enhance the performance of error-correcting codes and near-term applications.
arXiv Detail & Related papers (2023-08-03T23:30:56Z)
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z)
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs [9.807687918954763]
Convolutional Neural Networks (CNNs) have become the standard class of deep neural network for image processing, classification and segmentation tasks. RedBit is an open-source framework that provides a transparent, easy-to-use interface to evaluate the effectiveness of different algorithms on network accuracy.
arXiv Detail & Related papers (2023-01-15T21:27:35Z)
The case for 4-bit precision: k-bit Inference Scaling Laws [75.4335600212427]
Quantization methods reduce the number of bits required to represent each parameter in a model. The final model size depends on both the number of parameters of the original model and the rate of compression. We run more than 35,000 zero-shot experiments with 16-bit inputs and k-bit parameters to examine which quantization methods improve scaling for 3 to 8-bit precision.
arXiv Detail & Related papers (2022-12-19T18:48:33Z)
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training [31.515532976570643]
We show that single-level scaling is sufficient to maintain training accuracy while maximizing arithmetic density. We propose Accuracy Booster, a mixed-mantissa HBFP technique that uses 4-bit mantissas for over 99% of all arithmetic operations in training.
arXiv Detail & Related papers (2022-11-19T16:17:11Z)
n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization [0.0]
Powers-of-two (PoT) quantization reduces the number of bit operations of deep neural networks on resource-constrained hardware. PoT quantization triggers a severe accuracy drop because of its limited representation ability. We propose an efficient PoT quantization scheme that balances accuracy and costs in a memory-efficient way.
arXiv Detail & Related papers (2021-03-22T10:13:12Z)
Bayesian Bits: Unifying Quantization and Pruning [73.27732135853243]
We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks.
arXiv Detail & Related papers (2020-05-14T16:00:34Z)
Training Binary Neural Networks with Real-to-Binary Convolutions [52.91164959767517]
We show how to train binary networks to within a few percent points of the full precision counterpart. We show how to build a strong baseline, which already achieves state-of-the-art accuracy. We show that, when putting all of our improvements together, the proposed model beats the current state of the art by more than 5% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2020-03-25T17:54:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.