LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural
Networks Based on Graphics Processing Units
- URL: http://arxiv.org/abs/2003.08646v3
- Date: Tue, 28 Jul 2020 13:15:20 GMT
- Title: LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural
Networks Based on Graphics Processing Units
- Authors: Guangli Li, Lei Liu, Xueying Wang, Xiu Ma, Xiaobing Feng
- Abstract summary: We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques.
We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
- Score: 6.110973485878557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accelerating deep convolutional neural networks has become an active topic
and sparked an interest in academia and industry. In this paper, we propose an
efficient low-precision quantized Winograd convolution algorithm, called LANCE,
which combines the advantages of fast convolution and quantization techniques.
By embedding linear quantization operations into the Winograd-domain, the fast
convolution can be performed efficiently under low-precision computation on
graphics processing units. We test neural network models with LANCE on
representative image classification datasets, including SVHN, CIFAR, and
ImageNet. The experimental results show that our 8-bit quantized Winograd
convolution improves the performance by up to 2.40x over the full-precision
convolution with trivial accuracy loss.
Related papers
- Towards Efficient Verification of Quantized Neural Networks [9.352320240912109]
Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models.
We show how efficiency can be improved by utilizing gradient-based search methods and also bound-propagation techniques.
arXiv Detail & Related papers (2023-12-20T00:43:13Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - Quantized Proximal Averaging Network for Analysis Sparse Coding [23.080395291046408]
We unfold an iterative algorithm into a trainable network that facilitates learning sparsity prior to quantization.
We demonstrate applications to compressed image recovery and magnetic resonance image reconstruction.
arXiv Detail & Related papers (2021-05-13T12:05:35Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - Optimisation of a Siamese Neural Network for Real-Time Energy Efficient
Object Tracking [0.0]
optimisation of visual object tracking using a Siamese neural network for embedded vision systems is presented.
It was assumed that the solution shall operate in real-time, preferably for a high resolution video stream.
arXiv Detail & Related papers (2020-07-01T13:49:56Z) - Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical
Polynomials Base [0.0]
Winograd convolution algorithm is a common used method that significantly reduces time consumption.
We present the application of base change technique for quantized Winograd-aware training model.
arXiv Detail & Related papers (2020-04-23T11:15:27Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z) - Searching for Winograd-aware Quantized Networks [12.351250944079949]
We propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations.
We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10.
arXiv Detail & Related papers (2020-02-25T07:53:53Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.