Going Further With Winograd Convolutions: Tap-Wise Quantization for
Efficient Inference on 4x4 Tile
- URL: http://arxiv.org/abs/2209.12982v1
- Date: Mon, 26 Sep 2022 19:29:51 GMT
- Title: Going Further With Winograd Convolutions: Tap-Wise Quantization for
Efficient Inference on 4x4 Tile
- Authors: Renzo Andri, Beatrice Bussolino, Antonio Cipolletta, Lukas Cavigelli,
Zhe Wang
- Abstract summary: Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm.
We propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles.
We show how to integrate such custom modules in an industrial-grade, programmable DSA.
- Score: 7.705762754955851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of today's computer vision pipelines are built around deep neural
networks, where convolution operations require most of the generally high
compute effort. The Winograd convolution algorithm computes convolutions with
fewer MACs compared to the standard algorithm, reducing the operation count by
a factor of 2.25x for 3x3 convolutions when using the version with 2x2-sized
tiles $F_2$. Even though the gain is significant, the Winograd algorithm with
larger tile sizes, i.e., $F_4$, offers even more potential in improving
throughput and energy efficiency, as it reduces the required MACs by 4x.
Unfortunately, the Winograd algorithm with larger tile sizes introduces
numerical issues that prevent its use on integer domain-specific accelerators
and higher computational overhead to transform input and output data between
spatial and Winograd domains.
To unlock the full potential of Winograd $F_4$, we propose a novel tap-wise
quantization method that overcomes the numerical issues of using larger tiles,
enabling integer-only inference. Moreover, we present custom hardware units
that process the Winograd transformations in a power- and area-efficient way,
and we show how to integrate such custom modules in an industrial-grade,
programmable DSA. An extensive experimental evaluation on a large set of
state-of-the-art computer vision benchmarks reveals that the tap-wise
quantization algorithm makes the quantized Winograd $F_4$ network almost as
accurate as the FP32 baseline. The Winograd-enhanced DSA achieves up to 1.85x
gain in energy efficiency and up to 1.83x end-to-end speed-up for
state-of-the-art segmentation and detection networks.
Related papers
- Point Transformer V3: Simpler, Faster, Stronger [88.80496333515325]
This paper focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing.
We present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms.
PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios.
arXiv Detail & Related papers (2023-12-15T18:59:59Z) - MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor
Formula for Image Dehazing [88.61523825903998]
Transformer networks are beginning to replace pure convolutional neural networks (CNNs) in the field of computer vision.
We propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity.
We introduce a multi-branch architecture with multi-scale patch embedding to the proposed Transformer, which embeds features by overlapping deformable convolution of different scales.
Our model, named Multi-branch Transformer expanded by Taylor formula (MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch embedding stage and capture long-distance pixel interactions with limited computational cost
arXiv Detail & Related papers (2023-08-27T08:10:23Z) - Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
with Task-level Sparsity via Mixture-of-Experts [60.1586169973792]
M$3$ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE)
MoE achieves better accuracy and over 80% reduction computation but leaves challenges for efficient deployment on FPGA.
Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations.
arXiv Detail & Related papers (2023-05-30T02:24:03Z) - Winograd Algorithm for AdderNet [54.93995545896655]
Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions.
This paper studies the winograd algorithm, which is a widely used fast algorithm for accelerating convolution and saving the computational costs.
arXiv Detail & Related papers (2021-05-12T09:13:34Z) - Accelerating Large Kernel Convolutions with Nested Winograd
Transformation.pdf [2.193040410545991]
This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions.
Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4x4 to 31x31 convolutions.
arXiv Detail & Related papers (2021-02-26T02:42:42Z) - Efficient Residue Number System Based Winograd Convolution [15.210764522845416]
Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point.
Our work extends the Winograd algorithm to Residue Number System (RNS)
The minimal complexity convolution is computed precisely over large transformation tile.
arXiv Detail & Related papers (2020-07-23T19:07:06Z) - LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural
Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques.
We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z) - Region adaptive graph fourier transform for 3d point clouds [51.193111325231165]
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes.
The RA-GFT achieves better complexity-performance trade-offs than previous approaches.
arXiv Detail & Related papers (2020-03-04T02:47:44Z) - XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv)
It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels.
XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z) - Searching for Winograd-aware Quantized Networks [12.351250944079949]
We propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations.
We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10.
arXiv Detail & Related papers (2020-02-25T07:53:53Z) - DWM: A Decomposable Winograd Method for Convolution Acceleration [29.312042061351782]
Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing.
It suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1.
We propose a novel Decomposable Winograd Method (DWM) which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions.
arXiv Detail & Related papers (2020-02-03T03:42:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.