Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization
Pragmas Using Bayesian Optimization
- URL: http://arxiv.org/abs/2010.08040v1
- Date: Thu, 15 Oct 2020 22:09:42 GMT
- Title: Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization
Pragmas Using Bayesian Optimization
- Authors: Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul
Hovland, Valerie Taylor, Mary Hall
- Abstract summary: An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application.
We develop an autotuning framework that leverages Bayesian optimization to explore the parameter space search.
- Score: 0.6583716093321499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An autotuning is an approach that explores a search space of possible
implementations/configurations of a kernel or an application by selecting and
evaluating a subset of implementations/configurations on a target platform
and/or use models to identify a high performance implementation/configuration.
In this paper, we develop an autotuning framework that leverages Bayesian
optimization to explore the parameter space search. We select six of the most
complex benchmarks from the application domains of the PolyBench benchmarks
(syr2k, 3mm, heat-3d, lu, covariance, and Floyd-Warshall) and apply the newly
developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to
optimize them. We then use the autotuning framework to optimize the pragma
parameters to improve their performance. The experimental results show that our
autotuning approach outperforms the other compiling methods to provide the
smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and
covariance with two large datasets in 200 code evaluations for effectively
searching the parameter spaces with up to 170,368 different configurations. We
compare four different supervised learning methods within Bayesian optimization
and evaluate their effectiveness. We find that the Floyd-Warshall benchmark did
not benefit from autotuning because Polly uses heuristics to optimize the
benchmark to make it run much slower. To cope with this issue, we provide some
compiler option solutions to improve the performance.
Related papers
- Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks.
In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z) - Compiler generated feedback for Large Language Models [3.86901256759401]
We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly.
The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs.
arXiv Detail & Related papers (2024-03-18T23:25:13Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free
Optimization Algorithms [0.6543507682026964]
Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch.
These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users.
In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
arXiv Detail & Related papers (2021-09-13T19:10:23Z) - Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization
Pragmas Using Bayesian Optimization (extended version) [0.8070511670572696]
We use LLVM Clang/Polly loop optimization pragmas to optimize PolyBench benchmarks.
We then use the autotuning framework to optimize the pragma parameters to improve their performance.
We present loop autotuning without a user's knowledge using a simple mctree autotuning framework to further improve the performance of the Floyd-Warshall benchmark.
arXiv Detail & Related papers (2021-04-27T14:46:57Z) - Autotuning Search Space for Loop Transformations [0.03683202928838612]
We propose a loop transformation search space that takes the form of a tree.
We implemented a simple autotuner exploring the search space and applied it to a selected set of PolyBench kernels.
arXiv Detail & Related papers (2020-10-13T16:26:57Z) - Static Neural Compiler Optimization via Deep Reinforcement Learning [1.458855293397494]
In this paper, we employ a deep reinforcement learning approach to the phase-ordering problem.
Provided with sub-sequences constituting LLVM's O3 sequence, our agent learns to outperform the O3 sequence on the set of source codes used for training.
We believe that the models trained using our approach can be integrated into modern compilers as neural optimization agents.
arXiv Detail & Related papers (2020-08-20T13:16:29Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.