Bayesian Optimization for auto-tuning GPU kernels
- URL: http://arxiv.org/abs/2111.14991v1
- Date: Fri, 26 Nov 2021 11:26:26 GMT
- Title: Bayesian Optimization for auto-tuning GPU kernels
- Authors: Floris-Jan Willemsen, Rob van Nieuwpoort, Ben van Werkhoven
- Abstract summary: Finding optimal parameter configurations for GPU kernels is a non-trivial exercise for large search spaces, even when automated.
We introduce a novel contextual exploration factor as well as new acquisition functions with improved scalability, combined with an informed function selection mechanism.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Finding optimal parameter configurations for tunable GPU kernels is a
non-trivial exercise for large search spaces, even when automated. This poses
an optimization task on a non-convex search space, using an expensive to
evaluate function with unknown derivative. These characteristics make a good
candidate for Bayesian Optimization, which has not been applied to this problem
before. However, the application of Bayesian Optimization to this problem is
challenging. We demonstrate how to deal with the rough, discrete, constrained
search spaces, containing invalid configurations. We introduce a novel
contextual variance exploration factor, as well as new acquisition functions
with improved scalability, combined with an informed acquisition function
selection mechanism. By comparing the performance of our Bayesian Optimization
implementation on various test cases to the existing search strategies in
Kernel Tuner, as well as other Bayesian Optimization implementations, we
demonstrate that our search strategies generalize well and consistently
outperform other search strategies by a wide margin.
Related papers
- Optimizing Posterior Samples for Bayesian Optimization via Rootfinding [2.94944680995069]
We introduce an efficient global optimization strategy for posterior samples based on global rootfinding.
We demonstrate remarkable improvement in both inner- and outer-loop optimization.
We also propose a sample-average formulation of GP-TS, which has a parameter to explicitly control exploitation.
arXiv Detail & Related papers (2024-10-29T17:57:16Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Optimistic Optimization of Gaussian Process Samples [30.226274682578172]
A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function.
We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.
arXiv Detail & Related papers (2022-09-02T09:06:24Z) - Bayesian Optimization over Permutation Spaces [30.650753803587794]
We propose and evaluate two algorithms for BO over Permutation Spaces (BOPS)
We theoretically analyze the performance of BOPS-T to show that their regret grows sub-linearly.
Our experiments on multiple synthetic and real-world benchmarks show that both BOPS-T and BOPS-H perform better than the state-of-the-art BO algorithm for spaces.
arXiv Detail & Related papers (2021-12-02T08:20:50Z) - Fighting the curse of dimensionality: A machine learning approach to
finding global optima [77.34726150561087]
This paper shows how to find global optima in structural optimization problems.
By exploiting certain cost functions we either obtain the global at best or obtain superior results at worst when compared to established optimization procedures.
arXiv Detail & Related papers (2021-10-28T09:50:29Z) - Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z) - NOVAS: Non-convex Optimization via Adaptive Stochastic Search for
End-to-End Learning and Control [22.120942106939122]
We propose the use of adaptive search as a building block for general, non- neural optimization operations.
We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
arXiv Detail & Related papers (2020-06-22T03:40:36Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z) - Towards Automatic Bayesian Optimization: A first step involving
acquisition functions [0.0]
Bayesian optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression.
We propose a first attempt over automatic bayesian optimization by exploring several techniques that automatically tune the acquisition function.
arXiv Detail & Related papers (2020-03-21T12:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.