Related papers: Accelerating Large Kernel Convolutions with Nested Winograd Transformation.pdf

Accelerating Large Kernel Convolutions with Nested Winograd Transformation.pdf

URL: http://arxiv.org/abs/2102.13272v2
Date: Sun, 31 Dec 2023 02:55:49 GMT
Title: Accelerating Large Kernel Convolutions with Nested Winograd Transformation.pdf
Authors: Jingbo Jiang, Xizi Chen, Chi-Ying Tsui
Abstract summary: This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4x4 to 31x31 convolutions.
Score: 2.193040410545991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs with stacked small kernels in many computer vision tasks, such as object detection and image restoration. The Winograd transformation helps reduce the number of repetitive multiplications in convolution and is widely supported by many commercial AI processors. Researchers have proposed accelerating large kernel convolutions by linearly decomposing them into many small kernel convolutions and then sequentially accelerating each small kernel convolution with the Winograd algorithm. This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions and proves it to be more effective than the linear decomposition Winograd transformation algorithm. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4x4 to 31x31 convolutions.

Related papers

ParCNetV2: Oversized Kernel with Enhanced Attention [60.141606180434195]
We introduce a convolutional neural network architecture named ParCNetV2. It extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units. Our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.
arXiv Detail & Related papers (2022-11-14T07:22:55Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile [7.705762754955851]
Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm. We propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles. We show how to integrate such custom modules in an industrial-grade, programmable DSA.
arXiv Detail & Related papers (2022-09-26T19:29:51Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Fast and High-Quality Image Denoising via Malleable Convolutions [72.18723834537494]
We present Malleable Convolution (MalleConv), as an efficient variant of dynamic convolution. Unlike previous works, MalleConv generates a much smaller set of spatially-varying kernels from input. We also build an efficient denoising network using MalleConv, coined as MalleNet.
arXiv Detail & Related papers (2022-01-02T18:35:20Z)
Fast Convolution based on Winograd Minimum Filtering: Introduction and Development [5.192451499848539]
Convolution operators are the fundamental component of convolutional neural networks. In recent years, researchers have proposed several fast convolution algorithms including FFT and Winograd. This article summarizes the development of Winograd convolution from the three aspects of algorithm expansion, algorithm optimization, implementation, and application.
arXiv Detail & Related papers (2021-11-01T14:39:56Z)
Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
Winograd Algorithm for AdderNet [54.93995545896655]
Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions. This paper studies the winograd algorithm, which is a widely used fast algorithm for accelerating convolution and saving the computational costs.
arXiv Detail & Related papers (2021-05-12T09:13:34Z)
Efficient Residue Number System Based Winograd Convolution [15.210764522845416]
Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point. Our work extends the Winograd algorithm to Residue Number System (RNS) The minimal complexity convolution is computed precisely over large transformation tile.
arXiv Detail & Related papers (2020-07-23T19:07:06Z)
LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z)
DWM: A Decomposable Winograd Method for Convolution Acceleration [29.312042061351782]
Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. It suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. We propose a novel Decomposable Winograd Method (DWM) which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions.
arXiv Detail & Related papers (2020-02-03T03:42:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.