The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal
Padding
- URL: http://arxiv.org/abs/2110.10221v1
- Date: Tue, 19 Oct 2021 19:39:04 GMT
- Title: The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal
Padding
- Authors: Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry
- Abstract summary: CoRa is a tensor compiler that allows users to easily generate efficient code for ragged tensor operators.
We evaluate CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model.
- Score: 14.635810503599759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is often variation in the shape and size of input data used for deep
learning. In many cases, such data can be represented using tensors with
non-uniform shapes, or ragged tensors. Due to limited and non-portable support
for efficient execution on ragged tensors, current deep learning frameworks
generally use techniques such as padding and masking to make the data shapes
uniform and then offload the computations to optimized kernels for dense tensor
algebra. Such techniques can, however, lead to a lot of wasted computation and
therefore, a loss in performance. This paper presents CoRa, a tensor compiler
that allows users to easily generate efficient code for ragged tensor operators
targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of
operators on ragged tensors as well as on an encoder layer of the transformer
model, we find that CoRa (i)performs competitively with hand-optimized
implementations of the operators and the transformer encoder and (ii) achieves,
over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a
1.86X geomean speedup for the multi-head attention module used in transformers
on an ARM CPU.
Related papers
- FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers [6.194917248699324]
This paper proposes a new technique for deep learning compilers called FTuner.
Experiments show that the FTuner can achieve comparable operators and end-to-end performance to vendor libraries.
arXiv Detail & Related papers (2024-07-31T08:05:33Z) - Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores [47.87810316745786]
We propose a compression-based tensor decomposition framework, namely the exascale-tensor, to support exascale tensor decomposition.
Compared to the baselines, the exascale-tensor supports 8,000x larger tensors and a speedup up to 6.95x.
We also apply our method to two real-world applications, including gene analysis and tensor layer neural networks.
arXiv Detail & Related papers (2023-11-22T21:04:59Z) - TensorKrowch: Smooth integration of tensor networks in machine learning [46.0920431279359]
We introduceKrowch, an open source Python library built on top of PyTorch.
Krowch allows users to construct any tensor network, train it, and integrate it as a layer in more intricate deep learning models.
arXiv Detail & Related papers (2023-06-14T15:55:19Z) - Low-Rank Tensor Function Representation for Multi-Dimensional Data
Recovery [52.21846313876592]
Low-rank tensor function representation (LRTFR) can continuously represent data beyond meshgrid with infinite resolution.
We develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization.
Our method substantiates the superiority and versatility of our method as compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-01T04:00:38Z) - Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor
Programs [11.338285393619042]
We propose to embed the scheduling process into tensor programs and use dedicated mappings, called task mappings, to define the computation assignment and ordering.
With the proposed paradigm, we implement a deep learning compiler - Hidet.
arXiv Detail & Related papers (2022-10-18T05:32:13Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z) - Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data
via Differentiable Cross-Approximation [53.95297550117153]
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking emphat a fraction of their entries only.
The proposed approach is particularly useful for large-scale multidimensional grid data, and for tasks that require context over a large receptive field.
arXiv Detail & Related papers (2021-05-29T08:39:57Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - UNIT: Unifying Tensorized Instruction Compilation [11.193044425743981]
Hardware vendors offer tensorized instructions for mixed-precision operations, like Intel VNNI, Core, and ARM-DOT.
The lack of compilation techniques for this makes it hard to utilize these instructions.
We develop a compiler framework to unify the compilation for these instructions.
arXiv Detail & Related papers (2021-01-21T06:22:58Z) - Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA)
TRA is a set-based algebra based on the relational algebra.
Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.