A Greedy Algorithm for Quantizing Neural Networks
- URL: http://arxiv.org/abs/2010.15979v2
- Date: Sun, 15 Aug 2021 04:42:45 GMT
- Title: A Greedy Algorithm for Quantizing Neural Networks
- Authors: Eric Lybrand, Rayan Saab
- Abstract summary: We propose a new computationally efficient method for quantizing the weights of pre- trained neural networks.
Our method deterministically quantizes layers in an iterative fashion with no complicated re-training required.
- Score: 4.683806391173103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new computationally efficient method for quantizing the weights
of pre- trained neural networks that is general enough to handle both
multi-layer perceptrons and convolutional neural networks. Our method
deterministically quantizes layers in an iterative fashion with no complicated
re-training required. Specifically, we quantize each neuron, or hidden unit,
using a greedy path-following algorithm. This simple algorithm is equivalent to
running a dynamical system, which we prove is stable for quantizing a
single-layer neural network (or, alternatively, for quantizing the first layer
of a multi-layer network) when the training data are Gaussian. We show that
under these assumptions, the quantization error decays with the width of the
layer, i.e., its level of over-parametrization. We provide numerical
experiments, on multi-layer networks, to illustrate the performance of our
methods on MNIST and CIFAR10 data, as well as for quantizing the VGG16 network
using ImageNet data.
Related papers
- Frame Quantization of Neural Networks [2.8720213314158234]
We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory.
We derive an error bound between the original neural network and the quantized neural network in terms of step size and the number of frame elements.
arXiv Detail & Related papers (2024-04-11T21:24:38Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network
Quantization [5.982922468400901]
We show that it is possible to achieve error bounds equivalent to that obtained in the order of the weights of a neural layer.
We prove that it is possible to achieve full-network bounds under an infinite alphabet and minimal assumptions on the input data.
arXiv Detail & Related papers (2023-09-20T00:35:16Z) - A simple approach for quantizing neural networks [7.056222499095849]
We propose a new method for quantizing the weights of a fully trained neural network.
A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization.
The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
arXiv Detail & Related papers (2022-09-07T22:36:56Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Training Quantized Deep Neural Networks via Cooperative Coevolution [27.967480639403796]
We propose a new method for quantizing deep neural networks (DNNs)
Under the framework of cooperative coevolution, we use the estimation of distribution algorithm to search for the low-bits weights.
Experiments show that our method can train 4 bit ResNet-20 on the Cifar-10 dataset without sacrificing accuracy.
arXiv Detail & Related papers (2021-12-23T09:13:13Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.