PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives
- URL: http://arxiv.org/abs/2002.02145v1
- Date: Thu, 6 Feb 2020 08:02:34 GMT
- Title: PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives
- Authors: Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep
Goyal, Ramakrishna Upadrasta, Bharat Kaul
- Abstract summary: We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
- Score: 55.79741270235602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At the heart of deep learning training and inferencing are computationally
intensive primitives such as convolutions which form the building blocks of
deep neural networks. Researchers have taken two distinct approaches to
creating high performance implementations of deep learning kernels, namely, 1)
library development exemplified by Intel MKL-DNN for CPUs, 2) automatic
compilation represented by the TensorFlow XLA compiler. The two approaches have
their drawbacks: even though a custom built library can deliver very good
performance, the cost and time of development of the library can be high.
Automatic compilation of kernels is attractive but in practice, till date,
automatically generated implementations lag expert coded kernels in performance
by orders of magnitude.
In this paper, we develop a hybrid solution to the development of deep
learning kernels that achieves the best of both worlds: the expert coded
microkernels are utilized for the innermost loops of kernels and we use the
advanced polyhedral technology to automatically tune the outer loops for
performance. We design a novel polyhedral model based data reuse algorithm to
optimize the outer loops of the kernel. Through experimental evaluation on an
important class of deep learning primitives namely convolutions, we demonstrate
that the approach we develop attains the same levels of performance as Intel
MKL-DNN, a hand coded deep learning library.
Related papers
- Accelerating Machine Learning Primitives on Commodity Hardware [0.0]
We present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix multiplication (GEMM) based convolution in Deep Neural Networks (DNNs)
Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators.
This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware.
arXiv Detail & Related papers (2023-10-08T16:26:18Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep
Learning Compilation [8.64220475114214]
oneDNN Graph Compiler employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high performance code generation.
Experimental results demonstrate significant performance gains over existing tensor compiler and primitives library for performance-critical computation graphs.
arXiv Detail & Related papers (2023-01-03T19:52:17Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning [10.145707529307462]
In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural networks.
Our approach at least matches Intel MKL-DNN and in some cases outperforms it by 5x (on multicore- CPU)
arXiv Detail & Related papers (2020-05-07T07:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.