Efficient Residue Number System Based Winograd Convolution
- URL: http://arxiv.org/abs/2007.12216v1
- Date: Thu, 23 Jul 2020 19:07:06 GMT
- Title: Efficient Residue Number System Based Winograd Convolution
- Authors: Zhi-Gang Liu and Matthew Mattina
- Abstract summary: Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point.
Our work extends the Winograd algorithm to Residue Number System (RNS)
The minimal complexity convolution is computed precisely over large transformation tile.
- Score: 15.210764522845416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior research has shown that Winograd algorithm can reduce the computational
complexity of convolutional neural networks (CNN) with weights and activations
represented in floating point. However it is difficult to apply the scheme to
the inference of low-precision quantized (e.g. INT8) networks. Our work extends
the Winograd algorithm to Residue Number System (RNS). The minimal complexity
convolution is computed precisely over large transformation tile (e.g. 10 x 10
to 16 x 16) of filters and activation patches using the Winograd transformation
and low cost (e.g. 8-bit) arithmetic without degrading the prediction accuracy
of the networks during inference. The arithmetic complexity reduction is up to
7.03x while the performance improvement is up to 2.30x to 4.69x for 3 x 3 and 5
x 5 filters respectively.
Related papers
- Going Further With Winograd Convolutions: Tap-Wise Quantization for
Efficient Inference on 4x4 Tile [7.705762754955851]
Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm.
We propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles.
We show how to integrate such custom modules in an industrial-grade, programmable DSA.
arXiv Detail & Related papers (2022-09-26T19:29:51Z) - Winograd Algorithm for AdderNet [54.93995545896655]
Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions.
This paper studies the winograd algorithm, which is a widely used fast algorithm for accelerating convolution and saving the computational costs.
arXiv Detail & Related papers (2021-05-12T09:13:34Z) - A Survey of Quantization Methods for Efficient Neural Network Inference [75.55159744950859]
quantization is the problem of distributing continuous real-valued numbers over a fixed discrete set of numbers to minimize the number of bits required.
It has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas.
Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x.
arXiv Detail & Related papers (2021-03-25T06:57:11Z) - Accelerating Large Kernel Convolutions with Nested Winograd
Transformation.pdf [2.193040410545991]
This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions.
Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4x4 to 31x31 convolutions.
arXiv Detail & Related papers (2021-02-26T02:42:42Z) - HAWQV3: Dyadic Neural Network Quantization [73.11579145354801]
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values.
We present HAWQV3, a novel mixed-precision integer-only quantization framework.
arXiv Detail & Related papers (2020-11-20T23:51:43Z) - WrapNet: Neural Net Inference with Ultra-Low-Resolution Arithmetic [57.07483440807549]
We propose a method that adapts neural networks to use low-resolution (8-bit) additions in the accumulators, achieving classification accuracy comparable to their 32-bit counterparts.
We demonstrate the efficacy of our approach on both software and hardware platforms.
arXiv Detail & Related papers (2020-07-26T23:18:38Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical
Polynomials Base [0.0]
Winograd convolution algorithm is a common used method that significantly reduces time consumption.
We present the application of base change technique for quantized Winograd-aware training model.
arXiv Detail & Related papers (2020-04-23T11:15:27Z) - LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural
Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques.
We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z) - DWM: A Decomposable Winograd Method for Convolution Acceleration [29.312042061351782]
Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing.
It suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1.
We propose a novel Decomposable Winograd Method (DWM) which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions.
arXiv Detail & Related papers (2020-02-03T03:42:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.