TensorIR: An Abstraction for Automatic Tensorized Program Optimization
- URL: http://arxiv.org/abs/2207.04296v1
- Date: Sat, 9 Jul 2022 16:28:57 GMT
- Title: TensorIR: An Abstraction for Automatic Tensorized Program Optimization
- Authors: Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang
Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, and Tianqi Chen
- Abstract summary: We presentIR, a compiler for optimizing programs with tensor computation primitives.
We build an end-to-end framework on top of our compilation to automatically optimize deep learning models for given tensor computation primitives.
- Score: 22.812702519665617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deploying deep learning models on various devices has become an important
topic. The wave of hardware specialization brings a diverse set of acceleration
primitives for multi-dimensional tensor computations. These new acceleration
primitives, along with the emerging machine learning models, bring tremendous
engineering challenges. In this paper, we present TensorIR, a compiler
abstraction for optimizing programs with these tensor computation primitives.
TensorIR generalizes the loop nest representation used in existing machine
learning compilers to bring tensor computation as the first-class citizen.
Finally, we build an end-to-end framework on top of our abstraction to
automatically optimize deep learning models for given tensor computation
primitives. Experimental results show that TensorIR compilation automatically
uses the tensor computation primitives for given hardware backends and delivers
performance that is competitive to state-of-art hand-optimized systems across
platforms.
Related papers
- Relax: Composable Abstractions for End-to-End Dynamic Machine Learning [19.79913796167022]
We present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads.
Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program.
We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models.
arXiv Detail & Related papers (2023-11-01T23:03:59Z) - TensorKrowch: Smooth integration of tensor networks in machine learning [46.0920431279359]
We introduceKrowch, an open source Python library built on top of PyTorch.
Krowch allows users to construct any tensor network, train it, and integrate it as a layer in more intricate deep learning models.
arXiv Detail & Related papers (2023-06-14T15:55:19Z) - oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep
Learning Compilation [8.64220475114214]
oneDNN Graph Compiler employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high performance code generation.
Experimental results demonstrate significant performance gains over existing tensor compiler and primitives library for performance-critical computation graphs.
arXiv Detail & Related papers (2023-01-03T19:52:17Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor
Programs [11.338285393619042]
We propose to embed the scheduling process into tensor programs and use dedicated mappings, called task mappings, to define the computation assignment and ordering.
With the proposed paradigm, we implement a deep learning compiler - Hidet.
arXiv Detail & Related papers (2022-10-18T05:32:13Z) - Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA)
TRA is a set-based algebra based on the relational algebra.
Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z) - A Learned Performance Model for Tensor Processing Units [5.733911161090224]
We demonstrate a method of learning performance models from a corpus of graph programs for Processing Unit (TPU) instances.
We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks.
It helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.
arXiv Detail & Related papers (2020-08-03T17:24:52Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.