A Deep Learning Inference Scheme Based on Pipelined Matrix
Multiplication Acceleration Design and Non-uniform Quantization
- URL: http://arxiv.org/abs/2110.04861v1
- Date: Sun, 10 Oct 2021 17:31:27 GMT
- Title: A Deep Learning Inference Scheme Based on Pipelined Matrix
Multiplication Acceleration Design and Non-uniform Quantization
- Authors: Yuyang Zhang, Dik Hin Leung, Min Guo, Yijia Xiao, Haoyue Liu, Yunfei
Li, Jiyuan Zhang, Guan Wang, Zhen Chen
- Abstract summary: We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology.
Results show that our method can achieve better performance with fewer power consumption.
- Score: 9.454905560571085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Matrix multiplication is the bedrock in Deep Learning inference application.
When it comes to hardware acceleration on edge computing devices, matrix
multiplication often takes up a great majority of the time. To achieve better
performance in edge computing, we introduce a low-power Multi-layer Perceptron
(MLP) accelerator based on a pipelined matrix multiplication scheme and a
nonuniform quantization methodology. The implementation is running on
Field-programmable Gate Array (FPGA) devices and tested its performance on
handwritten digit classification and Q-learning tasks. Results show that our
method can achieve better performance with fewer power consumption.
Related papers
- Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA [10.630802853096462]
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations.
This paper proposes a high- throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs.
Using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.
arXiv Detail & Related papers (2024-07-02T15:28:10Z) - Many-body computing on Field Programmable Gate Arrays [5.612626580467746]
We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations.
This has resulted in a remarkable tenfold speedup compared to CPU-based computation.
arXiv Detail & Related papers (2024-02-09T14:01:02Z) - CoLA: Exploiting Compositional Structure for Automatic and Efficient
Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA.
By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms.
We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z) - Automated Sizing and Training of Efficient Deep Autoencoders using
Second Order Algorithms [0.46040036610482665]
We propose a multi-step training method for generalized linear classifiers.
validation error is minimized by pruning of unnecessary inputs.
desired outputs are improved via a method similar to the Ho-Kashyap rule.
arXiv Detail & Related papers (2023-08-11T16:48:31Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access [71.58925117604039]
Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks.
We propose a neural network architecture that combines the advantages of both linear and non-linear processing.
arXiv Detail & Related papers (2022-06-13T09:38:23Z) - High-Dimensional Sparse Bayesian Learning without Covariance Matrices [66.60078365202867]
We introduce a new inference scheme that avoids explicit construction of the covariance matrix.
Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm.
On several simulations, our method scales better than existing approaches in computation time and memory.
arXiv Detail & Related papers (2022-02-25T16:35:26Z) - Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z) - Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices [6.335302509003343]
We present a novel framework for designing multiplierless kernel machines.
The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique.
We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform.
arXiv Detail & Related papers (2021-06-03T16:06:08Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.