Related papers: Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical Polynomials Base

Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical Polynomials Base

URL: http://arxiv.org/abs/2004.11077v1
Date: Thu, 23 Apr 2020 11:15:27 GMT
Title: Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical Polynomials Base
Authors: Barbara Barabasz
Abstract summary: Winograd convolution algorithm is a common used method that significantly reduces time consumption. We present the application of base change technique for quantized Winograd-aware training model.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The problem how to speed up the convolution computations in Deep Neural Networks is widely investigated in recent years. The Winograd convolution algorithm is a common used method that significantly reduces time consumption. However, it suffers from a problem with numerical accuracy particularly for lower precisions. In this paper we present the application of base change technique for quantized Winograd-aware training model. We show that we can train the $8$ bit quantized network to nearly the same accuracy (up to 0.5% loss) for tested network (Resnet18) and dataset (CIFAR10) as for quantized direct convolution with few additional operations in pre/post transformations. Keeping Hadamard product on $9$ bits allow us to obtain the same accuracy as for direct convolution.

Related papers

Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z)
Winograd Convolution for Deep Neural Networks: Efficient Point Selection [0.8043754868448141]
We propose a novel approach to point selection using points of the form -1/c, -c, c, 1/c using the full range of real-valued numbers for c. We find that the error for different values of c forms a rough curve across the range of real-value numbers helping to localize the values of c that reduce error. We study a range of sizes for small convolutions and achieve reduction in error ranging from 2% to around 59% for both 1D and 2D convolutions.
arXiv Detail & Related papers (2022-01-25T15:00:54Z)
Training Quantized Deep Neural Networks via Cooperative Coevolution [27.967480639403796]
We propose a new method for quantizing deep neural networks (DNNs) Under the framework of cooperative coevolution, we use the estimation of distribution algorithm to search for the low-bits weights. Experiments show that our method can train 4 bit ResNet-20 on the Cifar-10 dataset without sacrificing accuracy.
arXiv Detail & Related papers (2021-12-23T09:13:13Z)
OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Efficient Residue Number System Based Winograd Convolution [15.210764522845416]
Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point. Our work extends the Winograd algorithm to Residue Number System (RNS) The minimal complexity convolution is computed precisely over large transformation tile.
arXiv Detail & Related papers (2020-07-23T19:07:06Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
ZeroQ: A Novel Zero Shot Quantization Framework [83.63606876854168]
Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. Existing zero-shot quantization methods use different epochs to address this, but they result in poor performance. Here, we propose ZeroQ, a novel zero-shot quantization framework to address this.
arXiv Detail & Related papers (2020-01-01T23:58:26Z)
AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs. We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.