Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks
- URL: http://arxiv.org/abs/2011.14578v1
- Date: Mon, 30 Nov 2020 06:54:28 GMT
- Title: Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks
- Authors: Stone Yun and Alexander Wong
- Abstract summary: We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
- Score: 93.4221402881609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the proliferation of deep convolutional neural network (CNN) algorithms
for mobile processing, limited precision quantization has become an essential
tool for CNN efficiency. Consequently, various works have sought to design
fixed precision quantization algorithms and quantization-focused optimization
techniques that minimize quantization induced performance degradation. However,
there is little concrete understanding of how various CNN design decisions/best
practices affect quantized inference behaviour. Weight initialization
strategies are often associated with solving issues such as vanishing/exploding
gradients but an often-overlooked aspect is their impact on the final trained
distributions of each layer. We present an in-depth, fine-grained ablation
study of the effect of different weights initializations on the final
distributions of weights and activations of different CNN architectures. The
fine-grained, layerwise analysis enables us to gain deep insights on how
initial weights distributions will affect final accuracy and quantized
behaviour. To our best knowledge, we are the first to perform such a low-level,
in-depth quantitative analysis of weights initialization and its effect on
quantized behaviour.
Related papers
- Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Recurrence of Optimum for Training Weight and Activation Quantized
Networks [4.103701929881022]
Training deep learning models with low-precision weights and activations involves a demanding optimization task.
We show how to overcome the nature of network quantization.
We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.
arXiv Detail & Related papers (2020-12-10T09:14:43Z) - FactorizeNet: Progressive Depth Factorization for Efficient Network
Architecture Exploration Under Quantization Constraints [93.4221402881609]
We introduce a progressive depth factorization strategy for efficient CNN architecture exploration under quantization constraints.
By algorithmically increasing the granularity of depth factorization in a progressive manner, the proposed strategy enables a fine-grained, low-level analysis of layer-wise distributions.
Such a progressive depth factorization strategy also enables efficient identification of the optimal depth-factorized macroarchitecture design.
arXiv Detail & Related papers (2020-11-30T07:12:26Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.