Related papers: Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

URL: http://arxiv.org/abs/2006.11967v1
Date: Mon, 22 Jun 2020 01:54:04 GMT
Title: Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization
Authors: Yuan Wen, David Gregg
Abstract summary: Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs) We identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values.
Score: 0.2538209532048866
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs). Pruning removes near-zero weights in tensors and masks weak connections between neurons in neighbouring layers. Quantization reduces the precision of weights by replacing them with numerically similar values that require less storage. In this paper, we identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values. We observe that pruning and quantization both tend to drastically increase the number of repeated patterns in the weight tensors. We investigate several compression schemes to take advantage of this structure in CNN weight data, including multiple forms of Huffman coding, and other approaches inspired by block sparse matrix formats. We evaluate our approach on several well-known CNNs and find that we can achieve compaction ratios of 1.4x to 3.1x in addition to the saving from pruning and quantization.

Related papers

A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks [0.24578723416255746]
We propose a multi-objective evolutionary algorithm (MOEA) based compression framework independent of neural network architecture, dimension, task, and dataset. We use uniformly sized bins to quantize network weights into a single codebook for efficient weight representation. The experimental results show that we can reduce the neural network memory by $13.72 sim14.98 times$ on CIFAR-10, $11.61 sim 12.99times$ on CIFAR-100, and $7.44 sim 8.58times$ on ImageNet.
arXiv Detail & Related papers (2025-01-06T15:51:29Z)
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight. We develop an improved KL metric to determine optimal quantization scales for activation. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z)
Weight Fixing Networks [0.0]
We look to whole-network quantisation to minimise the entropy and number of unique parameters in a network. We propose a new method, which we call Weight Fixing Networks (WFN) that we design to realise four model outcome objectives.
arXiv Detail & Related papers (2022-10-24T19:18:02Z)
Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing [58.401504709365284]
We present a weight similarity measure that can quantify the weight similarity of non-volution neural networks. We first normalize the weights of neural networks by a chain normalization rule, which is used to introduce weight-training representation learning. We extend traditional hypothesis-testing method to validate the hypothesis on the weight similarity of neural networks.
arXiv Detail & Related papers (2022-08-08T19:11:03Z)
Quantized Sparse Weight Decomposition for Neural Network Compression [12.24566619983231]
We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.
arXiv Detail & Related papers (2022-07-22T12:40:03Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
ReCU: Reviving the Dead Weights in Binary Neural Networks [153.6789340484509]
We explore the influence of "dead weights" which refer to a group of weights that are barely updated during the training of BNNs. We prove that reviving the "dead weights" by ReCU can result in a smaller quantization error. Our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-03-23T08:11:20Z)
Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Transform Quantization for CNN (Convolutional Neural Network) Compression [26.62351408292294]
We optimally transform weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. We show that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios.
arXiv Detail & Related papers (2020-09-02T16:33:42Z)
Retrain or not retrain? -- efficient pruning methods of deep CNN networks [0.30458514384586394]
Convolutional neural networks (CNN) play a major role in image processing tasks like image classification, object detection, semantic segmentation. Very often CNN networks have from several to hundred stacked layers with several megabytes of weights. One of the possible methods to reduce complexity and memory footprint is pruning.
arXiv Detail & Related papers (2020-02-12T23:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.