Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir
Computing
- URL: http://arxiv.org/abs/2101.08884v1
- Date: Thu, 21 Jan 2021 23:16:22 GMT
- Title: Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir
Computing
- Authors: Matthew Denton and Herman Schmit
- Abstract summary: Reservoir computing systems rely on the recurrent multiplication of a very large, sparse, fixed matrix.
We argue that direct implementation of these fixed matrices minimizes the work performed in the computation.
We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reservoir computing systems rely on the recurrent multiplication of a very
large, sparse, fixed matrix. We argue that direct spatial implementation of
these fixed matrices minimizes the work performed in the computation, and
allows for significant reduction in latency and power through constant
propagation and logic minimization. Bit-serial arithmetic enables massive
static matrices to be implemented. We present the structure of our bit-serial
matrix multiplier, and evaluate using canonical signed digit representation to
further reduce logic utilization. We have implemented these matrices on a large
FPGA and provide a cost model that is simple and extensible. These FPGA
implementations, on average, reduce latency by 50x up to 86x versus GPU
libraries. Comparing against a recent sparse DNN accelerator, we measure a 4.1x
to 47x reduction in latency depending on matrix dimension and sparsity.
Throughput of the FPGA solution is also competitive for a wide range of matrix
dimensions and batch sizes. Finally, we discuss ways these techniques could be
deployed in ASICs, making them applicable for dynamic sparse matrix
computations.
Related papers
- Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores [3.6385567224218556]
Large language models (LLMs) have been widely applied but face challenges in efficient inference.
We introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization.
We implement an arbitrary precision matrix multiplication scheme that decomposes and recovers at the bit level, enabling flexible precision.
arXiv Detail & Related papers (2024-09-26T14:17:58Z) - Fast inference with Kronecker-sparse matrices [4.387337528923525]
We present the first energy and time benchmarks for the multiplication with Kronecker-sparse matrices.
Our benchmark also reveals that specialized implementations spend up to 50% of their total runtime on memory rewriting operations.
We implement a new kernel that achieves a median speed-up of x1.4, while also cutting energy consumption by 15%.
arXiv Detail & Related papers (2024-05-23T19:36:10Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Optimized Sparse Matrix Operations for Reverse Mode Automatic
Differentiation [3.72826300260966]
We present an implementation of a CSR-based sparse matrix wrapper for PyTorch with acceleration for basic matrix operations, as well as automatic differentiability.
We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
arXiv Detail & Related papers (2022-12-10T00:46:51Z) - RSC: Accelerating Graph Neural Networks Training via Randomized Sparse
Computations [56.59168541623729]
Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware.
We explore trading off the computational precision to reduce the time complexity via sampling-based approximation.
We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
arXiv Detail & Related papers (2022-10-19T17:25:33Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z) - Robust 1-bit Compressive Sensing with Partial Gaussian Circulant
Matrices and Generative Priors [54.936314353063494]
We provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing.
We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our results.
arXiv Detail & Related papers (2021-08-08T05:28:06Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and
Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning.
FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices.
We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.