Related papers: Towards automated kernel selection in machine learning systems: A SYCL case study

Towards automated kernel selection in machine learning systems: A SYCL case study

URL: http://arxiv.org/abs/2003.06795v1
Date: Sun, 15 Mar 2020 11:23:36 GMT
Title: Towards automated kernel selection in machine learning systems: A SYCL case study
Authors: John Lawson
Abstract summary: We present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries. By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models, where the network topology is constant, but machine learning research often involves changing network topologies and hyperparameters. Traditional kernel auto-tuning has limited impact in this case; a more general selection of kernels is required for libraries to accelerate machine learning research. In this paper we present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries that target a range of heterogeneous devices from desktop GPUs to embedded accelerators. The techniques investigated apply more generally and could similarly be integrated with other heterogeneous programming systems. By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.

Related papers

NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units. It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
Kernel Approximation using Analog In-Memory Computing [3.5231018007564203]
Kernel functions are vital ingredients of several machine learning algorithms, but often incur significant memory and computational costs. We introduce an approach to kernel approximation in machine learning algorithms suitable for mixed-signal Analog In-Memory Computing (AIMC) architectures.
arXiv Detail & Related papers (2024-11-05T16:18:47Z)
Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting. This paper proposes a simple but effective edge-friendly incremental learning framework. Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z)
Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport [5.1672267755831705]
For Gaussian processes, selecting the kernel is a crucial task, often done manually by the expert. We propose a novel, efficient search method through a general, structured kernel space.
arXiv Detail & Related papers (2022-10-21T09:30:21Z)
Local Sample-weighted Multiple Kernel Clustering with Consensus Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. This paper proposes a novel local sample-weighted multiple kernel clustering model. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
mlf-core: a framework for deterministic machine learning [0.08795040582681389]
Major machine learning libraries default to the usage of non-deterministic algorithms based on atomic operations. To overcome this shortcoming, various machine learning libraries released deterministic counterparts to the non-deterministic algorithms. We developed a new software solution, the mlf-core, which aids machine learning projects to meet and keep these requirements.
arXiv Detail & Related papers (2021-04-15T17:58:03Z)
Source Code Classification for Energy Efficiency in Parallel Ultra Low-Power Microcontrollers [5.4352987210173955]
This paper aims at increasing smartness in the software toolchain to exploit modern architectures in the best way. In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumption. Experiments show that using machine learning models on the source code to select the best energy scaling configuration automatically is viable and has the potential to be used in the context of automatic system configuration for energy minimisation.
arXiv Detail & Related papers (2020-12-12T15:12:03Z)
Performance portability through machine learning guided kernel selection in SYCL libraries [0.0]
General purpose compute libraries must be able to cater to all inputs and parameters provided by a user. Machine learning methods can be used to mitigate against both of these problems. tuning the process for new hardware or problems does not require any developer effort or expertise.
arXiv Detail & Related papers (2020-08-30T11:44:37Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)
PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels. We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.