Tensor Processing Primitives: A Programming Abstraction for Efficiency
and Portability in Deep Learning Workloads
- URL: http://arxiv.org/abs/2104.05755v2
- Date: Wed, 14 Apr 2021 15:38:38 GMT
- Title: Tensor Processing Primitives: A Programming Abstraction for Efficiency
and Portability in Deep Learning Workloads
- Authors: Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem
Adelman, Cristina Anderson, Alexander Breuer, Narendra Chaudhary, Abhisek
Kundu, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Barukh
Ziv, Alexander Heinecke
- Abstract summary: This work introduces the Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of Deep Learning-workloads with high-productivity.
TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual ISA), which can be utilized as building-blocks to construct complex operators on high-dimensional tensors.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end DL-workloads expressed entirely via TPPs that outperform state-of-the-art implementations on multiple platforms.
- Score: 86.62083829086393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During the past decade, novel Deep Learning (DL) algorithms/workloads and
hardware have been developed to tackle a wide range of problems. Despite the
advances in workload/hardware ecosystems, the programming methodology of
DL-systems is stagnant. DL-workloads leverage either highly-optimized, yet
platform-specific and inflexible kernels from DL-libraries, or in the case of
novel operators, reference implementations are built via DL-framework
primitives with underwhelming performance. This work introduces the Tensor
Processing Primitives (TPP), a programming abstraction striving for efficient,
portable implementation of DL-workloads with high-productivity. TPPs define a
compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA),
which subsequently can be utilized as building-blocks to construct complex
operators on high-dimensional tensors. The TPP specification is
platform-agnostic, thus code expressed via TPPs is portable, whereas the TPP
implementation is highly-optimized and platform-specific. We demonstrate the
efficacy of our approach using standalone kernels and end-to-end DL-workloads
expressed entirely via TPPs that outperform state-of-the-art implementations on
multiple platforms.
Related papers
- SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models [53.638791265113625]
Sparsity-Preserved efficient fine-tuning method for large language models.
Code will be made available at https://github.com/Lucky-Lance/SPP.
arXiv Detail & Related papers (2024-05-25T04:55:27Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Slapo: A Schedule Language for Progressive Optimization of Large Deep
Learning Model Training [17.556432199389615]
Slapo is a schedule language that decouples the execution of a tensor-level operator from its arithmetic definition.
We show that Slapo can improve training throughput by up to 2.92x on a single machine with 8 NVIDIA V100 GPUs.
arXiv Detail & Related papers (2023-02-16T00:34:53Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Scalable Deep-Learning-Accelerated Topology Optimization for Additively
Manufactured Materials [4.221095652322005]
Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices.
To address these issues, we propose a general scalable deep-learning (DL) based TO framework, referred to as SDL-TO.
Our framework accelerates TO by learning the iterative history data and simultaneously training on the mapping between the given design and its gradient.
arXiv Detail & Related papers (2020-11-28T17:38:31Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - The Deep Learning Compiler: A Comprehensive Survey [16.19025439622745]
We perform a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details.
Specifically, we provide a comprehensive comparison among existing DL compilers from various aspects.
arXiv Detail & Related papers (2020-02-06T07:29:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.