Related papers: Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation

Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation

URL: http://arxiv.org/abs/2405.14024v1
Date: Wed, 22 May 2024 21:59:46 GMT
Title: Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Authors: Mykhailo Uss, Ruslan Yermolenko, Olena Kolodiazhna, Oleksii Shashko, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong,
Abstract summary: We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output. We demonstrate that this mapping can reduce quantization error. Our approach can be applied to other tasks, including segmentation, object detection, and key-points prediction.
Score: 3.4606942690643336
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Quantization is widely used to increase deep neural networks' (DNN) memory, computation, and power efficiency. Various techniques, such as post-training quantization and quantization-aware training, have been proposed to improve quantization quality. We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output. We represent the target quantity as a point on a 2D parametric curve. The DNN model is modified to predict 2D points that are mapped back to the target quantity at a post-processing stage. We demonstrate that this mapping can reduce quantization error. For the low-order parametric Hilbert curve, Depth-From-Stereo task, and two models represented by U-Net architecture and vision transformer, we achieved a quantization error reduction by about 5 times for the INT8 model at both CPU and DSP delegates. This gain comes with a minimal inference time increase (less than 7%). Our approach can be applied to other tasks, including segmentation, object detection, and key-points prediction.

Related papers

Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients [51.82488018573326]
We present QP-SBGD, a novel layer-wise optimiser tailored towards training neural networks with binary weights. BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy. Our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware.
arXiv Detail & Related papers (2023-10-23T17:32:38Z)
Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message Propagation [3.9177379733188715]
We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments. We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification. The proposed quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization.
arXiv Detail & Related papers (2023-08-29T00:25:02Z)
Designing strong baselines for ternary neural network quantization through support and mass equalization [7.971065005161565]
Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. This computational burden can be dramatically reduced by quantizing floating point values to ternary values. We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios.
arXiv Detail & Related papers (2023-06-30T07:35:07Z)
GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks [80.29667394618625]
We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures. We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs.
arXiv Detail & Related papers (2022-08-26T08:00:02Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks [0.0]
We propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN. Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size.
arXiv Detail & Related papers (2020-11-13T04:12:54Z)
Subtensor Quantization for Mobilenets [5.735035463793008]
Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.
arXiv Detail & Related papers (2020-11-04T15:41:47Z)
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks [0.7378164273177589]
Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution. We propose a novel quantization method to quantize the weight and activation. The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
arXiv Detail & Related papers (2020-07-08T05:10:53Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.