Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization
Pragmas Using Bayesian Optimization (extended version)
- URL: http://arxiv.org/abs/2104.13242v1
- Date: Tue, 27 Apr 2021 14:46:57 GMT
- Title: Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization
Pragmas Using Bayesian Optimization (extended version)
- Authors: Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul
Hovland, Valerie Taylor, and Mary Hall
- Abstract summary: We use LLVM Clang/Polly loop optimization pragmas to optimize PolyBench benchmarks.
We then use the autotuning framework to optimize the pragma parameters to improve their performance.
We present loop autotuning without a user's knowledge using a simple mctree autotuning framework to further improve the performance of the Floyd-Warshall benchmark.
- Score: 0.8070511670572696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we develop a ytopt autotuning framework that leverages
Bayesian optimization to explore the parameter space search and compare four
different supervised learning methods within Bayesian optimization and evaluate
their effectiveness. We select six of the most complex PolyBench benchmarks and
apply the newly developed LLVM Clang/Polly loop optimization pragmas to the
benchmarks to optimize them. We then use the autotuning framework to optimize
the pragma parameters to improve their performance. The experimental results
show that our autotuning approach outperforms the other compiling methods to
provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu,
and covariance with two large datasets in 200 code evaluations for effectively
searching the parameter spaces with up to 170,368 different configurations. We
find that the Floyd-Warshall benchmark did not benefit from autotuning because
Polly uses heuristics to optimize the benchmark to make it run much slower. To
cope with this issue, we provide some compiler option solutions to improve the
performance. Then we present loop autotuning without a user's knowledge using a
simple mctree autotuning framework to further improve the performance of the
Floyd-Warshall benchmark. We also extend the ytopt autotuning framework to tune
a deep learning application.
Related papers
- Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks.
In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z) - Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach.
Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free
Optimization Algorithms [0.6543507682026964]
Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch.
These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users.
In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
arXiv Detail & Related papers (2021-09-13T19:10:23Z) - Using hardware performance counters to speed up autotuning convergence
on GPUs [0.0]
We introduce a novel method for searching tuning spaces.
The method takes advantage of collecting hardware performance counters during empirical tuning.
We experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics.
arXiv Detail & Related papers (2021-02-10T07:42:39Z) - Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization
Pragmas Using Bayesian Optimization [0.6583716093321499]
An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application.
We develop an autotuning framework that leverages Bayesian optimization to explore the parameter space search.
arXiv Detail & Related papers (2020-10-15T22:09:42Z) - Autotuning Search Space for Loop Transformations [0.03683202928838612]
We propose a loop transformation search space that takes the form of a tree.
We implemented a simple autotuner exploring the search space and applied it to a selected set of PolyBench kernels.
arXiv Detail & Related papers (2020-10-13T16:26:57Z) - Static Neural Compiler Optimization via Deep Reinforcement Learning [1.458855293397494]
In this paper, we employ a deep reinforcement learning approach to the phase-ordering problem.
Provided with sub-sequences constituting LLVM's O3 sequence, our agent learns to outperform the O3 sequence on the set of source codes used for training.
We believe that the models trained using our approach can be integrated into modern compilers as neural optimization agents.
arXiv Detail & Related papers (2020-08-20T13:16:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.