Related papers: Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

URL: http://arxiv.org/abs/2209.12982v1
Date: Mon, 26 Sep 2022 19:29:51 GMT
Title: Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile
Authors: Renzo Andri, Beatrice Bussolino, Antonio Cipolletta, Lukas Cavigelli, Zhe Wang
Abstract summary: Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm. We propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles. We show how to integrate such custom modules in an industrial-grade, programmable DSA.
Score: 7.705762754955851
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most of today's computer vision pipelines are built around deep neural networks, where convolution operations require most of the generally high compute effort. The Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm, reducing the operation count by a factor of 2.25x for 3x3 convolutions when using the version with 2x2-sized tiles $F_2$. Even though the gain is significant, the Winograd algorithm with larger tile sizes, i.e., $F_4$, offers even more potential in improving throughput and energy efficiency, as it reduces the required MACs by 4x. Unfortunately, the Winograd algorithm with larger tile sizes introduces numerical issues that prevent its use on integer domain-specific accelerators and higher computational overhead to transform input and output data between spatial and Winograd domains. To unlock the full potential of Winograd $F_4$, we propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles, enabling integer-only inference. Moreover, we present custom hardware units that process the Winograd transformations in a power- and area-efficient way, and we show how to integrate such custom modules in an industrial-grade, programmable DSA. An extensive experimental evaluation on a large set of state-of-the-art computer vision benchmarks reveals that the tap-wise quantization algorithm makes the quantized Winograd $F_4$ network almost as accurate as the FP32 baseline. The Winograd-enhanced DSA achieves up to 1.85x gain in energy efficiency and up to 1.83x end-to-end speed-up for state-of-the-art segmentation and detection networks.

Related papers

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales [4.1966303054440655]
Quantization of diffusion models has been explored in recent works to reduce compute costs and memory bandwidth usage. For text-to-image generation task, the 8-bit fully-quantized diffusion model with Winograd provides near-lossless quality. For image classification, our method outperforms the state-of-the-art Winograd PTQ method by 1.62% and 2.56% in top-1 ImageNet accuracy.
arXiv Detail & Related papers (2024-12-27T09:05:48Z)
Point Transformer V3: Simpler, Faster, Stronger [88.80496333515325]
This paper focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing. We present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms. PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios.
arXiv Detail & Related papers (2023-12-15T18:59:59Z)
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing [88.61523825903998]
Transformer networks are beginning to replace pure convolutional neural networks (CNNs) in the field of computer vision. We propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity. We introduce a multi-branch architecture with multi-scale patch embedding to the proposed Transformer, which embeds features by overlapping deformable convolution of different scales. Our model, named Multi-branch Transformer expanded by Taylor formula (MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch embedding stage and capture long-distance pixel interactions with limited computational cost
arXiv Detail & Related papers (2023-08-27T08:10:23Z)
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts [60.1586169973792]
M$3$ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE) MoE achieves better accuracy and over 80% reduction computation but leaves challenges for efficient deployment on FPGA. Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations.
arXiv Detail & Related papers (2023-05-30T02:24:03Z)
Winograd Algorithm for AdderNet [54.93995545896655]
Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions. This paper studies the winograd algorithm, which is a widely used fast algorithm for accelerating convolution and saving the computational costs.
arXiv Detail & Related papers (2021-05-12T09:13:34Z)
Accelerating Large Kernel Convolutions with Nested Winograd Transformation.pdf [2.193040410545991]
This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4x4 to 31x31 convolutions.
arXiv Detail & Related papers (2021-02-26T02:42:42Z)
Efficient Residue Number System Based Winograd Convolution [15.210764522845416]
Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point. Our work extends the Winograd algorithm to Residue Number System (RNS) The minimal complexity convolution is computed precisely over large transformation tile.
arXiv Detail & Related papers (2020-07-23T19:07:06Z)
LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z)
Region adaptive graph fourier transform for 3d point clouds [51.193111325231165]
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. The RA-GFT achieves better complexity-performance trade-offs than previous approaches.
arXiv Detail & Related papers (2020-03-04T02:47:44Z)
XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv) It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels. XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)
Searching for Winograd-aware Quantized Networks [12.351250944079949]
We propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations. We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10.
arXiv Detail & Related papers (2020-02-25T07:53:53Z)
DWM: A Decomposable Winograd Method for Convolution Acceleration [29.312042061351782]
Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. It suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. We propose a novel Decomposable Winograd Method (DWM) which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions.
arXiv Detail & Related papers (2020-02-03T03:42:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.