SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic
- URL: http://arxiv.org/abs/2407.02913v1
- Date: Wed, 3 Jul 2024 08:38:14 GMT
- Title: SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic
- Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du,
- Abstract summary: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models.
These algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization.
We propose SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform with symbolic computing.
We show that our new algorithms can further improve the efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization.
- Score: 20.150429327542128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform (DFT) with symbolic computing, in which only additions are required to perform the transformation at specific transform points, avoiding the calculation of irrational number and reducing the requirement for precision. Additionally, we enhance convolution efficiency by introducing correction terms to convert invalid circular convolution outputs of the Fourier method into effective ones. The numerical error analysis is presented for the first time in this type of work and proves that our algorithms can provide a 3.68x multiplication reduction for 3x3 convolution, while the Winograd algorithm only achieves a 2.25x reduction with similarly low numerical errors. Experiments carried out on benchmarks and FPGA show that our new algorithms can further improve the computation efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization.
Related papers
- Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm.
A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate.
Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Sparse Binary Transformers for Multivariate Time Series Modeling [1.3965477771846404]
We show that lightweight Compressed Neural Networks can achieve accuracy comparable to dense floating-point Transformers.
Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting.
We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count.
arXiv Detail & Related papers (2023-08-09T00:23:04Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR.
We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields.
We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Accelerating Neural Network Inference by Overflow Aware Quantization [16.673051600608535]
Inherited heavy computation of deep neural networks prevents their widespread applications.
We propose an overflow aware quantization method by designing trainable adaptive fixed-point representation.
With the proposed method, we are able to fully utilize the computing power to minimize the quantization loss and obtain optimized inference performance.
arXiv Detail & Related papers (2020-05-27T11:56:22Z) - LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural
Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques.
We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.