Performance portability through machine learning guided kernel selection
in SYCL libraries
- URL: http://arxiv.org/abs/2008.13145v1
- Date: Sun, 30 Aug 2020 11:44:37 GMT
- Title: Performance portability through machine learning guided kernel selection
in SYCL libraries
- Authors: John Lawson
- Abstract summary: General purpose compute libraries must be able to cater to all inputs and parameters provided by a user.
Machine learning methods can be used to mitigate against both of these problems.
tuning the process for new hardware or problems does not require any developer effort or expertise.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically tuning parallel compute kernels allows libraries and frameworks
to achieve performance on a wide range of hardware, however these techniques
are typically focused on finding optimal kernel parameters for particular input
sizes and parameters. General purpose compute libraries must be able to cater
to all inputs and parameters provided by a user, and so these techniques are of
limited use. Additionally, parallel programming frameworks such as SYCL require
that the kernels be deployed in a binary format embedded within the library. As
such it is impractical to deploy a large number of possible kernel
configurations without inflating the library size.
Machine learning methods can be used to mitigate against both of these
problems and provide performance for general purpose routines with a limited
number of kernel configurations. We show that unsupervised clustering methods
can be used to select a subset of the possible kernels that should be deployed
and that simple classification methods can be trained to select from these
kernels at runtime to give good performance. As these techniques are fully
automated, relying only on benchmark data, the tuning process for new hardware
or problems does not require any developer effort or expertise.
Related papers
- Amortized Inference for Gaussian Process Hyperparameters of Structured
Kernels [5.1672267755831705]
Amortizing parameter inference over different datasets is a promising approach to dramatically speed up training time.
We propose amortizing kernel parameter inference over a complete kernel-structure-family rather than a fixed kernel structure.
We show drastically reduced inference time combined with competitive test performance for a large set of kernels and datasets.
arXiv Detail & Related papers (2023-06-16T13:02:57Z) - AutoCoreset: An Automatic Practical Coreset Construction Framework [65.37876706107764]
A coreset is a tiny weighted subset of an input set, that closely resembles the loss function.
We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user.
We show that while this set is limited, the coreset is quite general.
arXiv Detail & Related papers (2023-05-19T19:59:52Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - BioSequence2Vec: Efficient Embedding Generation For Biological Sequences [1.0896567381206714]
We propose a general-purpose representation learning approach that embodies kernel methods' qualities while avoiding computation, memory, and generalizability challenges.
Our proposed fast and alignment-free embedding method can be used as input to any distance.
We perform a variety of real-world classification tasks, such as SARS-CoV-2 lineage and gene family classification, outperforming several state-of-the-art embedding and kernel methods in predictive performance.
arXiv Detail & Related papers (2023-04-01T10:58:21Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to
Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling.
It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z) - Source Code Classification for Energy Efficiency in Parallel Ultra
Low-Power Microcontrollers [5.4352987210173955]
This paper aims at increasing smartness in the software toolchain to exploit modern architectures in the best way.
In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumption.
Experiments show that using machine learning models on the source code to select the best energy scaling configuration automatically is viable and has the potential to be used in the context of automatic system configuration for energy minimisation.
arXiv Detail & Related papers (2020-12-12T15:12:03Z) - Towards automated kernel selection in machine learning systems: A SYCL
case study [0.0]
We present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries.
By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
arXiv Detail & Related papers (2020-03-15T11:23:36Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.