Related papers: AMULET: Adaptive Matrix-Multiplication-Like Tasks

AMULET: Adaptive Matrix-Multiplication-Like Tasks

URL: http://arxiv.org/abs/2305.08872v1
Date: Fri, 12 May 2023 17:04:24 GMT
Title: AMULET: Adaptive Matrix-Multiplication-Like Tasks
Authors: Junyoung Kim, Kenneth Ross, Eric Sedlar, Lukas Stadler
Abstract summary: We extend an open-source compiler to recognize and optimize matrix multiplication-like tasks. Our framework, called Amulet, uses both database-style and compiler optimization techniques. Amulet typically performs within 15% of hand-tuned matrix multiplication libraries, while handling a much broader class of computations.
Score: 6.094431019524036
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many useful tasks in data science and machine learning applications can be written as simple variations of matrix multiplication. However, users have difficulty performing such tasks as existing matrix/vector libraries support only a limited class of computations hand-tuned for each unique hardware platform. Users can alternatively write the task as a simple nested loop but current compilers are not sophisticated enough to generate fast code for the task written in this way. To address these issues, we extend an open-source compiler to recognize and optimize these matrix multiplication-like tasks. Our framework, called Amulet, uses both database-style and compiler optimization techniques to generate fast code tailored to its execution environment. We show through experiments that Amulet achieves speedups on a variety of matrix multiplication-like tasks compared to existing compilers. For large matrices Amulet typically performs within 15% of hand-tuned matrix multiplication libraries, while handling a much broader class of computations.

Related papers

Masked Matrix Multiplication for Emergent Sparsity [1.4786952412297807]
Transformer models exhibit emergent sparsity in which computations perform selective sparse access to dense data. We build a vectorized and parallel matrix-multiplication system A X B = C that eliminates unnecessary computations.
arXiv Detail & Related papers (2024-02-21T20:36:08Z)
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA. By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms. Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time. We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z)
Multiplying Matrices Without Multiplying [0.0]
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. We introduce a learning-based algorithm for this task that greatly outperforms existing methods.
arXiv Detail & Related papers (2021-06-21T05:08:54Z)
A matrix math facility for Power ISA(TM) processors [0.16910097443356495]
A new family of matrix math instructions, collectively known as the Matrix-Multiply Assist facility, has been introduced in Power ISA(TM) Version 3.1. These instructions have led to a power- and area-efficient implementation of a high throughput math engine in the future POWER10 processor. Performance per core is 4 times better, at constant frequency, than the previous generation POWER9 processor.
arXiv Detail & Related papers (2021-04-07T14:17:32Z)
What if Neural Networks had SVDs? [66.91160214071088]
Various Neural Networks employ time-consuming matrix operations like matrix inversion. We present an algorithm that is fast enough to speed up several matrix operations.
arXiv Detail & Related papers (2020-09-29T12:58:52Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
Sketching Transformed Matrices with Applications to Natural Language Processing [76.6222695417524]
We propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. We show that our approach obtains small error and is efficient in both space and time.
arXiv Detail & Related papers (2020-02-23T03:07:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.