Masked Matrix Multiplication for Emergent Sparsity
- URL: http://arxiv.org/abs/2402.14118v1
- Date: Wed, 21 Feb 2024 20:36:08 GMT
- Title: Masked Matrix Multiplication for Emergent Sparsity
- Authors: Brian Wheatman, Meghana Madhyastha, and Randal Burns
- Abstract summary: Transformer models exhibit emergent sparsity in which computations perform selective sparse access to dense data.
We build a vectorized and parallel matrix-multiplication system A X B = C that eliminates unnecessary computations.
- Score: 1.4786952412297807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence workloads, especially transformer models, exhibit
emergent sparsity in which computations perform selective sparse access to
dense data. The workloads are inefficient on hardware designed for dense
computations and do not map well onto sparse data representations. We build a
vectorized and parallel matrix-multiplication system A X B = C that eliminates
unnecessary computations and avoids branches based on a runtime evaluation of
sparsity. We use a combination of dynamic code lookup to adapt to the specific
sparsity encoded in the B matrix and preprocessing of sparsity maps of the A
and B matrices to compute conditional branches once for the whole computation.
For a wide range of sparsity, from 60% to 95% zeros, our implementation
performs fewer instructions and increases performance when compared with Intel
MKL's dense or sparse matrix multiply routines. Benefits can be as large as 2
times speedup and 4 times fewer instructions.
Related papers
- Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain.
We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance.
We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z) - AMULET: Adaptive Matrix-Multiplication-Like Tasks [6.094431019524036]
We extend an open-source compiler to recognize and optimize matrix multiplication-like tasks.
Our framework, called Amulet, uses both database-style and compiler optimization techniques.
Amulet typically performs within 15% of hand-tuned matrix multiplication libraries, while handling a much broader class of computations.
arXiv Detail & Related papers (2023-05-12T17:04:24Z) - PopSparse: Accelerated block sparse matrix multiplication on IPU [0.5661403709207713]
We introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs.
We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run.
Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels.
arXiv Detail & Related papers (2023-03-29T20:00:19Z) - RSC: Accelerating Graph Neural Networks Training via Randomized Sparse
Computations [56.59168541623729]
Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware.
We explore trading off the computational precision to reduce the time complexity via sampling-based approximation.
We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
arXiv Detail & Related papers (2022-10-19T17:25:33Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Monarch: Expressive Structured Matrices for Efficient and Accurate
Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones.
We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z) - Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir
Computing [0.0]
Reservoir computing systems rely on the recurrent multiplication of a very large, sparse, fixed matrix.
We argue that direct implementation of these fixed matrices minimizes the work performed in the computation.
We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization.
arXiv Detail & Related papers (2021-01-21T23:16:22Z) - Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and
Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning.
FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices.
We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.