OpEvo: An Evolutionary Method for Tensor Operator Optimization
- URL: http://arxiv.org/abs/2006.05664v2
- Date: Mon, 21 Dec 2020 08:02:18 GMT
- Title: OpEvo: An Evolutionary Method for Tensor Operator Optimization
- Authors: Xiaotian Gao, Cui Wei, Lintao Zhang and Mao Yang
- Abstract summary: We propose a novel evolutionary method, OpEvo, which efficiently explores the search spaces of tensor operators.
Our comprehensive experiment results show that OpEvo can find the best configuration with the lowest variance and least efforts in the number of trials and wall-clock time.
- Score: 6.273446055072434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training and inference efficiency of deep neural networks highly rely on the
performance of tensor operators on hardware platforms. Manually optimizing
tensor operators has limitations in terms of supporting new operators or
hardware platforms. Therefore, automatically optimizing device code
configurations of tensor operators is getting increasingly attractive. However,
current methods for tensor operator optimization usually suffer from poor
sample-efficiency due to the combinatorial search space. In this work, we
propose a novel evolutionary method, OpEvo, which efficiently explores the
search spaces of tensor operators by introducing a topology-aware mutation
operation based on q-random walk to leverage the topological structures over
the search spaces. Our comprehensive experiment results show that compared with
state-of-the-art (SOTA) methods OpEvo can find the best configuration with the
lowest variance and least efforts in the number of trials and wall-clock time.
All code of this work is available online.
Related papers
- Syno: Structured Synthesis for Neural Operators [1.5826646053411249]
We develop an end-to-end framework Syno, to realize practical neural operator synthesis.
We demonstrate that Syno discovers better operators with an average of $2.06times$ speedup and less than $1%$ accuracy loss, even on NAS-optimized models.
arXiv Detail & Related papers (2024-10-31T09:00:24Z) - Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers [66.823588073584]
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications.
Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs.
We propose a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs.
arXiv Detail & Related papers (2023-10-02T02:01:16Z) - Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains [13.56018270837999]
We propose a simple method to extend neural operators to arbitrary domains.
An efficient implementation* of such direct spectral evaluations is coupled with existing neural operator models.
We demonstrate that the proposed method allows us to extend neural operators to arbitrary point distributions with significant gains in training speed over baselines.
arXiv Detail & Related papers (2023-05-31T09:01:20Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - OLLIE: Derivation-based Tensor Program Optimizer [13.23204410403652]
We propose OLLIE, the first derivation-based tensor program.
We show that OLLIE can outperform existing tensor expressions by up to 2.73$times$ (1.46$times$ on average) on an A100 GPU and up to 2.68$times$1$times$ on a V100 GPU.
arXiv Detail & Related papers (2022-08-02T14:38:58Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - AdaLead: A simple and robust adaptive greedy search algorithm for
sequence design [55.41644538483948]
We develop an easy-to-directed, scalable, and robust evolutionary greedy algorithm (AdaLead)
AdaLead is a remarkably strong benchmark that out-competes more complex state of the art approaches in a variety of biologically motivated sequence design challenges.
arXiv Detail & Related papers (2020-10-05T16:40:38Z) - Adaptive Learning of Tensor Network Structures [6.407946291544721]
We leverage the TN formalism to develop a generic and efficient adaptive algorithm to learn the structure and the parameters of a TN from data.
Our algorithm can adaptively identify TN structures with small number of parameters that effectively optimize any differentiable objective function.
arXiv Detail & Related papers (2020-08-12T16:41:56Z) - Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware
Multifaceted Optimizations [15.659251804042748]
Woodpecker-DL (WPK) is a hardware-aware deep learning framework.
WPK uses graph optimization, automated searches, domain-specific language ( DSL) and system-level exploration to accelerate inference.
We show that on a maximum P100 GPU, we can achieve the speedup of 5.40 over cuDNN and 1.63 over TVM on individual operators, and run up to 1.18 times faster than TeslaRT for end-to-end model inference.
arXiv Detail & Related papers (2020-08-11T07:50:34Z) - Differentiable Top-k Operator with Optimal Transport [135.36099648554054]
The SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem.
We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.
arXiv Detail & Related papers (2020-02-16T04:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.