Related papers: LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units

LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units

URL: http://arxiv.org/abs/2003.08646v3
Date: Tue, 28 Jul 2020 13:15:20 GMT
Title: LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units
Authors: Guangli Li, Lei Liu, Xueying Wang, Xiu Ma, Xiaobing Feng
Abstract summary: We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
Score: 6.110973485878557
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accelerating deep convolutional neural networks has become an active topic and sparked an interest in academia and industry. In this paper, we propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. By embedding linear quantization operations into the Winograd-domain, the fast convolution can be performed efficiently under low-precision computation on graphics processing units. We test neural network models with LANCE on representative image classification datasets, including SVHN, CIFAR, and ImageNet. The experimental results show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.

Related papers

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales [4.1966303054440655]
Quantization of diffusion models has been explored in recent works to reduce compute costs and memory bandwidth usage. For text-to-image generation task, the 8-bit fully-quantized diffusion model with Winograd provides near-lossless quality. For image classification, our method outperforms the state-of-the-art Winograd PTQ method by 1.62% and 2.56% in top-1 ImageNet accuracy.
arXiv Detail & Related papers (2024-12-27T09:05:48Z)
Towards Efficient Verification of Quantized Neural Networks [9.352320240912109]
Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models. We show how efficiency can be improved by utilizing gradient-based search methods and also bound-propagation techniques.
arXiv Detail & Related papers (2023-12-20T00:43:13Z)
Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
Quantized Proximal Averaging Network for Analysis Sparse Coding [23.080395291046408]
We unfold an iterative algorithm into a trainable network that facilitates learning sparsity prior to quantization. We demonstrate applications to compressed image recovery and magnetic resonance image reconstruction.
arXiv Detail & Related papers (2021-05-13T12:05:35Z)
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z)
DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs. Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance. We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z)
Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking [0.0]
optimisation of visual object tracking using a Siamese neural network for embedded vision systems is presented. It was assumed that the solution shall operate in real-time, preferably for a high resolution video stream.
arXiv Detail & Related papers (2020-07-01T13:49:56Z)
Quantaized Winograd/Toom-Cook Convolution for DNNs: Beyond Canonical Polynomials Base [0.0]
Winograd convolution algorithm is a common used method that significantly reduces time consumption. We present the application of base change technique for quantized Winograd-aware training model.
arXiv Detail & Related papers (2020-04-23T11:15:27Z)
Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
Searching for Winograd-aware Quantized Networks [12.351250944079949]
We propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations. We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10.
arXiv Detail & Related papers (2020-02-25T07:53:53Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.