Related papers: Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

URL: http://arxiv.org/abs/2101.08884v1
Date: Thu, 21 Jan 2021 23:16:22 GMT
Title: Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing
Authors: Matthew Denton and Herman Schmit
Abstract summary: Reservoir computing systems rely on the recurrent multiplication of a very large, sparse, fixed matrix. We argue that direct implementation of these fixed matrices minimizes the work performed in the computation. We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reservoir computing systems rely on the recurrent multiplication of a very large, sparse, fixed matrix. We argue that direct spatial implementation of these fixed matrices minimizes the work performed in the computation, and allows for significant reduction in latency and power through constant propagation and logic minimization. Bit-serial arithmetic enables massive static matrices to be implemented. We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization. We have implemented these matrices on a large FPGA and provide a cost model that is simple and extensible. These FPGA implementations, on average, reduce latency by 50x up to 86x versus GPU libraries. Comparing against a recent sparse DNN accelerator, we measure a 4.1x to 47x reduction in latency depending on matrix dimension and sparsity. Throughput of the FPGA solution is also competitive for a wide range of matrix dimensions and batch sizes. Finally, we discuss ways these techniques could be deployed in ASICs, making them applicable for dynamic sparse matrix computations.

Related papers

Improving Algorithmic Efficiency using Cryptography [11.496343300483904]
We show how to use cryptography to improve the time complexity of solving computational problems. We show that under standard cryptographic assumptions, we can design algorithms that are determinantally faster than existing ones.
arXiv Detail & Related papers (2025-02-18T17:08:59Z)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores [3.6385567224218556]
Large language models (LLMs) have been widely applied but face challenges in efficient inference. We introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization. We implement an arbitrary precision matrix multiplication scheme that decomposes and recovers at the bit level, enabling flexible precision.
arXiv Detail & Related papers (2024-09-26T14:17:58Z)
Fast inference with Kronecker-sparse matrices [4.387337528923525]
We present the first energy and time benchmarks for the multiplication with Kronecker-sparse matrices. Our benchmark also reveals that specialized implementations spend up to 50% of their total runtime on memory rewriting operations. We implement a new kernel that achieves a median speed-up of x1.4, while also cutting energy consumption by 15%.
arXiv Detail & Related papers (2024-05-23T19:36:10Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation [3.72826300260966]
We present an implementation of a CSR-based sparse matrix wrapper for PyTorch with acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
arXiv Detail & Related papers (2022-12-10T00:46:51Z)
RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations [56.59168541623729]
Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware. We explore trading off the computational precision to reduce the time complexity via sampling-based approximation. We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
arXiv Detail & Related papers (2022-10-19T17:25:33Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms. Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time. We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z)
Robust 1-bit Compressive Sensing with Partial Gaussian Circulant Matrices and Generative Priors [54.936314353063494]
We provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our results.
arXiv Detail & Related papers (2021-08-08T05:28:06Z)
Non-PSD Matrix Sketching with Applications to Regression and Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices. We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning. FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.