DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN
Inference
- URL: http://arxiv.org/abs/2306.16430v2
- Date: Wed, 22 Nov 2023 15:39:14 GMT
- Title: DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN
Inference
- Authors: Bahareh Khabbazan, Marc Riera, Antonio Gonz\'alez
- Abstract summary: Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity.
We propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss.
- Score: 0.2724035499453557
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the
storage and computational complexity by decreasing the arithmetical precision
of activations and weights, a.k.a. tensors. Efficient hardware architectures
employ linear quantization to enable the deployment of recent DNNs onto
embedded systems and mobile devices. However, linear uniform quantization
cannot usually reduce the numerical precision to less than 8 bits without
sacrificing high performance in terms of model accuracy. The performance loss
is due to the fact that tensors do not follow uniform distributions. In this
paper, we show that a significant amount of tensors fit into an exponential
distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors
with an adaptive scheme that achieves the best trade-off between numerical
precision and accuracy loss. The experimental results show that DNA-TEQ
provides a much lower quantization bit-width compared to previous proposals,
resulting in an average compression ratio of 40% over the linear INT8 baseline,
with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ
leads the way in performing dot-product operations in the exponential domain,
which saves 66% of energy consumption on average for a set of widely used DNNs.
Related papers
- Towards Cheaper Inference in Deep Networks with Lower Bit-Width
Accumulators [25.100092698906437]
Current hardware still relies on high-accuracy core operations.
This is because, so far, the usage of low-precision accumulators led to a significant degradation in performance.
We present a simple method to train and fine-tune high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits accumulators.
arXiv Detail & Related papers (2024-01-25T11:46:01Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - QEBVerif: Quantization Error Bound Verification of Neural Networks [6.327780998441913]
quantization is widely regarded as one promising technique for deploying deep neural networks (DNNs) on edge devices.
Existing verification methods focus on either individual neural networks (DNNs or QNNs) or quantization error bound for partial quantization.
We propose a quantization error bound verification method, named QEBVerif, where both weights and activation tensors are quantized.
arXiv Detail & Related papers (2022-12-06T06:34:38Z) - Post-Training Quantization for Energy Efficient Realization of Deep
Neural Networks [0.0]
The biggest challenge for the deployment of Deep Neural Networks (DNNs) close to the generated data on edge devices is their size, i.e., memory footprint and computational complexity.
We propose a post-training quantization flow without the need for retraining.
We excel state-of-the-art for 6 bit by 2.2% Top-1 accuracy for ImageNet.
arXiv Detail & Related papers (2022-10-14T15:43:57Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural
Networks [0.0]
We propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN.
Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size.
arXiv Detail & Related papers (2020-11-13T04:12:54Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.