OLLIE: Derivation-based Tensor Program Optimizer
- URL: http://arxiv.org/abs/2208.02025v1
- Date: Tue, 2 Aug 2022 14:38:58 GMT
- Title: OLLIE: Derivation-based Tensor Program Optimizer
- Authors: Liyan Zheng, Haojie Wang, Jidong Zhai, Muyan Hu, Zixuan Ma, Tuowei
Wang, Shizhi Tang, Lei Xie, Kezhao Huang and Zhihao Jia
- Abstract summary: We propose OLLIE, the first derivation-based tensor program.
We show that OLLIE can outperform existing tensor expressions by up to 2.73$times$ (1.46$times$ on average) on an A100 GPU and up to 2.68$times$1$times$ on a V100 GPU.
- Score: 13.23204410403652
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Boosting the runtime performance of deep neural networks (DNNs) is critical
due to their wide adoption in real-world tasks. Existing approaches to
optimizing the tensor algebra expression of a DNN only consider expressions
representable by a fixed set of predefined operators, missing possible
optimization opportunities between general expressions. We propose OLLIE, the
first derivation-based tensor program optimizer. OLLIE optimizes tensor
programs by leveraging transformations between general tensor algebra
expressions, enabling a significantly larger expression search space that
includes those supported by prior work as special cases. OLLIE uses a hybrid
derivation-based optimizer that effectively combines explorative and guided
derivations to quickly discover highly optimized expressions. Evaluation on
seven DNNs shows that OLLIE can outperform existing optimizers by up to
2.73$\times$ (1.46$\times$ on average) on an A100 GPU and up to 2.68$\times$
(1.51$\times$) on a V100 GPU, respectively.
Related papers
- Optimal Kernel Orchestration for Tensor Programs with Korch [13.143585283794902]
Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms.
This paper presents Korch, a program that discovers optimal kernel orchestration strategies for tensor programs.
arXiv Detail & Related papers (2024-06-13T04:44:38Z) - Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO)
ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization.
Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z) - Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts.
In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values.
We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z) - Bidirectional Looking with A Novel Double Exponential Moving Average to
Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework.
We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z) - Learning to Generalize Provably in Learning to Optimize [185.71326306329678]
Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizees by data-driven approaches.
Current L2O methods often suffer from poor generalization performance in at least two folds.
We propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework.
arXiv Detail & Related papers (2023-02-22T01:17:31Z) - oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep
Learning Compilation [8.64220475114214]
oneDNN Graph Compiler employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high performance code generation.
Experimental results demonstrate significant performance gains over existing tensor compiler and primitives library for performance-critical computation graphs.
arXiv Detail & Related papers (2023-01-03T19:52:17Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - ALT: Breaking the Wall between Graph and Operator Level Optimizations
for Deep Learning Compilation [38.8918502461244]
ALT is a compiler that performs joint graph- and operator-level optimizations for deep models.
JOG significantly outperforms state-of-the-art compilers (e.g., Ansor) in terms of both single operator performance and end-to-end inference performance.
arXiv Detail & Related papers (2022-10-22T11:09:36Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware
Multifaceted Optimizations [15.659251804042748]
Woodpecker-DL (WPK) is a hardware-aware deep learning framework.
WPK uses graph optimization, automated searches, domain-specific language ( DSL) and system-level exploration to accelerate inference.
We show that on a maximum P100 GPU, we can achieve the speedup of 5.40 over cuDNN and 1.63 over TVM on individual operators, and run up to 1.18 times faster than TeslaRT for end-to-end model inference.
arXiv Detail & Related papers (2020-08-11T07:50:34Z) - OpEvo: An Evolutionary Method for Tensor Operator Optimization [6.273446055072434]
We propose a novel evolutionary method, OpEvo, which efficiently explores the search spaces of tensor operators.
Our comprehensive experiment results show that OpEvo can find the best configuration with the lowest variance and least efforts in the number of trials and wall-clock time.
arXiv Detail & Related papers (2020-06-10T05:33:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.