Probabilistic Weight Fixing: Large-scale training of neural network
weight uncertainties for quantization
- URL: http://arxiv.org/abs/2309.13575v3
- Date: Tue, 3 Oct 2023 19:53:34 GMT
- Title: Probabilistic Weight Fixing: Large-scale training of neural network
weight uncertainties for quantization
- Authors: Christopher Subia-Waud and Srinandan Dasmahapatra
- Abstract summary: Weight-sharing quantization has emerged as a technique to reduce energy expenditure during inference in large neural networks.
This paper proposes a probabilistic framework based on Bayesian neural networks (BNNs) and a variational relaxation to identify which weights can be moved to which cluster centre.
Our method outperforms the state-of-the-art quantization method top-1 accuracy by 1.6% on ImageNet using DeiT-Tiny.
- Score: 7.2282857478457805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weight-sharing quantization has emerged as a technique to reduce energy
expenditure during inference in large neural networks by constraining their
weights to a limited set of values. However, existing methods for
weight-sharing quantization often make assumptions about the treatment of
weights based on value alone that neglect the unique role weight position
plays. This paper proposes a probabilistic framework based on Bayesian neural
networks (BNNs) and a variational relaxation to identify which weights can be
moved to which cluster centre and to what degree based on their individual
position-specific learned uncertainty distributions. We introduce a new
initialisation setting and a regularisation term which allow for the training
of BNNs under complex dataset-model combinations. By leveraging the flexibility
of weight values captured through a probability distribution, we enhance noise
resilience and downstream compressibility. Our iterative clustering procedure
demonstrates superior compressibility and higher accuracy compared to
state-of-the-art methods on both ResNet models and the more complex
transformer-based architectures. In particular, our method outperforms the
state-of-the-art quantization method top-1 accuracy by 1.6% on ImageNet using
DeiT-Tiny, with its 5 million+ weights now represented by only 296 unique
values.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Understanding Weight Similarity of Neural Networks via Chain
Normalization Rule and Hypothesis-Training-Testing [58.401504709365284]
We present a weight similarity measure that can quantify the weight similarity of non-volution neural networks.
We first normalize the weights of neural networks by a chain normalization rule, which is used to introduce weight-training representation learning.
We extend traditional hypothesis-testing method to validate the hypothesis on the weight similarity of neural networks.
arXiv Detail & Related papers (2022-08-08T19:11:03Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Standard Deviation-Based Quantization for Deep Neural Networks [17.495852096822894]
Quantization of deep neural networks is a promising approach that reduces the inference cost.
We propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions.
Our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process.
arXiv Detail & Related papers (2022-02-24T23:33:47Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.