Related papers: Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch

Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch

URL: http://arxiv.org/abs/2202.09888v1
Date: Sun, 20 Feb 2022 18:51:00 GMT
Title: Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch
Authors: Aravind Sankaran, Navid Akbari Alashti, Christos Psarras, Paolo Bientinesi
Abstract summary: We develop benchmarks to investigate the linear algebra optimization capabilities of TF and PyT. In this work, we focus on linear algebra computations in TF and PyT.
Score: 1.1470070927586016
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks. The High-Performance Computing community invests significant effort in the development of architecture-specific optimized kernels, such as those provided by the BLAS and LAPACK libraries, to speed up linear algebra operations. However, end users are progressively less likely to go through the error prone and time-consuming process of directly using said kernels; instead, frameworks such as TensorFlow (TF) and PyTorch (PyT), which facilitate the development of machine learning applications, are becoming more and more popular. Although such frameworks link to BLAS and LAPACK, it is not clear whether or not they make use of linear algebra knowledge to speed up computations. For this reason, in this paper we develop benchmarks to investigate the linear algebra optimization capabilities of TF and PyT. Our analyses reveal that a number of linear algebra optimizations are still missing; for instance, reducing the number of scalar operations by applying the distributive law, and automatically identifying the optimal parenthesization of a matrix chain. In this work, we focus on linear algebra computations in TF and PyT; we both expose opportunities for performance enhancement to the benefit of the developers of the frameworks and provide end users with guidelines on how to achieve performance gains.

Related papers

TileLang: A Composable Tiled Programming Model for AI Systems [17.240134151647187]
We present TileLang, a generalized tiled programming model for more efficient AI programming. TileLang decouples scheduling space (thread binding, layout, tensorize and pipeline) from dataflow, and encapsulated them as a set of customization annotations and primitives. We conduct comprehensive experiments on commonly-used devices, across numerous experiments, our evaluation shows that TileLang can achieve state-of-the-art performance in key kernels.
arXiv Detail & Related papers (2025-04-24T14:08:49Z)
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA. By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z)
GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus. By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
ML-driven Hardware Cost Model for MLIR [1.2987894327817158]
We develop a machine learning-based cost model for high-level MLIR. By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research. We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest.
arXiv Detail & Related papers (2023-02-14T11:32:47Z)
TensorIR: An Abstraction for Automatic Tensorized Program Optimization [22.812702519665617]
We presentIR, a compiler for optimizing programs with tensor computation primitives. We build an end-to-end framework on top of our compilation to automatically optimize deep learning models for given tensor computation primitives.
arXiv Detail & Related papers (2022-07-09T16:28:57Z)
A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z)
Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA) TRA is a set-based algebra based on the relational algebra. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.