Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization
- URL: http://arxiv.org/abs/2006.11967v1
- Date: Mon, 22 Jun 2020 01:54:04 GMT
- Title: Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization
- Authors: Yuan Wen, David Gregg
- Abstract summary: Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs)
We identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values.
- Score: 0.2538209532048866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pruning and quantization are proven methods for improving the performance and
storage efficiency of convolutional neural networks (CNNs). Pruning removes
near-zero weights in tensors and masks weak connections between neurons in
neighbouring layers. Quantization reduces the precision of weights by replacing
them with numerically similar values that require less storage. In this paper,
we identify another form of redundancy in CNN weight tensors, in the form of
repeated patterns of similar values. We observe that pruning and quantization
both tend to drastically increase the number of repeated patterns in the weight
tensors.
We investigate several compression schemes to take advantage of this
structure in CNN weight data, including multiple forms of Huffman coding, and
other approaches inspired by block sparse matrix formats. We evaluate our
approach on several well-known CNNs and find that we can achieve compaction
ratios of 1.4x to 3.1x in addition to the saving from pruning and quantization.
Related papers
- Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Weight Fixing Networks [0.0]
We look to whole-network quantisation to minimise the entropy and number of unique parameters in a network.
We propose a new method, which we call Weight Fixing Networks (WFN) that we design to realise four model outcome objectives.
arXiv Detail & Related papers (2022-10-24T19:18:02Z) - Understanding Weight Similarity of Neural Networks via Chain
Normalization Rule and Hypothesis-Training-Testing [58.401504709365284]
We present a weight similarity measure that can quantify the weight similarity of non-volution neural networks.
We first normalize the weights of neural networks by a chain normalization rule, which is used to introduce weight-training representation learning.
We extend traditional hypothesis-testing method to validate the hypothesis on the weight similarity of neural networks.
arXiv Detail & Related papers (2022-08-08T19:11:03Z) - Quantized Sparse Weight Decomposition for Neural Network Compression [12.24566619983231]
We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA.
Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.
arXiv Detail & Related papers (2022-07-22T12:40:03Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - ReCU: Reviving the Dead Weights in Binary Neural Networks [153.6789340484509]
We explore the influence of "dead weights" which refer to a group of weights that are barely updated during the training of BNNs.
We prove that reviving the "dead weights" by ReCU can result in a smaller quantization error.
Our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-03-23T08:11:20Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Transform Quantization for CNN (Convolutional Neural Network)
Compression [26.62351408292294]
We optimally transform weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate.
We show that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios.
arXiv Detail & Related papers (2020-09-02T16:33:42Z) - Retrain or not retrain? -- efficient pruning methods of deep CNN
networks [0.30458514384586394]
Convolutional neural networks (CNN) play a major role in image processing tasks like image classification, object detection, semantic segmentation.
Very often CNN networks have from several to hundred stacked layers with several megabytes of weights.
One of the possible methods to reduce complexity and memory footprint is pruning.
arXiv Detail & Related papers (2020-02-12T23:24:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.