BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation
- URL: http://arxiv.org/abs/2207.01394v1
- Date: Mon, 4 Jul 2022 13:25:49 GMT
- Title: BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation
- Authors: Geon Park, Jaehong Yoon, Haiyang Zhang, Xing Zhang, Sung Ju Hwang,
Yonina C. Eldar
- Abstract summary: Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
- Score: 116.26521375592759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural network quantization aims to transform high-precision weights and
activations of a given neural network into low-precision weights/activations
for reduced memory usage and computation, while preserving the performance of
the original model. However, extreme quantization (1-bit weight/1-bit
activations) of compactly-designed backbone architectures (e.g., MobileNets)
often used for edge-device deployments results in severe performance
degeneration. This paper proposes a novel Quantization-Aware Training (QAT)
method that can effectively alleviate performance degeneration even with
extreme quantization by focusing on the inter-weight dependencies, between the
weights within each layer and across consecutive layers. To minimize the
quantization impact of each weight on others, we perform an orthonormal
transformation of the weights at each layer by training an input-dependent
correlation matrix and importance vector, such that each weight is disentangled
from the others. Then, we quantize the weights based on their importance to
minimize the loss of the information from the original weights/activations. We
further perform progressive layer-wise quantization from the bottom layer to
the top, so that quantization at each layer reflects the quantized
distributions of weights and activations at previous layers. We validate the
effectiveness of our method on various benchmark datasets against strong neural
quantization baselines, demonstrating that it alleviates the performance
degeneration on ImageNet and successfully preserves the full-precision model
performance on CIFAR-100 with compact backbone networks.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Quantized Proximal Averaging Network for Analysis Sparse Coding [23.080395291046408]
We unfold an iterative algorithm into a trainable network that facilitates learning sparsity prior to quantization.
We demonstrate applications to compressed image recovery and magnetic resonance image reconstruction.
arXiv Detail & Related papers (2021-05-13T12:05:35Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Recurrence of Optimum for Training Weight and Activation Quantized
Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task.
We show how to overcome the nature of network quantization.
We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.