Ansor: Generating High-Performance Tensor Programs for Deep Learning
- URL: http://arxiv.org/abs/2006.06762v5
- Date: Sun, 15 Oct 2023 07:00:36 GMT
- Title: Ansor: Generating High-Performance Tensor Programs for Deep Learning
- Authors: Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer
Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez,
Ion Stoica
- Abstract summary: We present Ansor, a tensor program generation framework for deep learning applications.
Ansor explores many more optimization combinations by sampling programs from a hierarchical representation of the search space.
Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches.
- Score: 45.437816016043534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-performance tensor programs are crucial to guarantee efficient execution
of deep neural networks. However, obtaining performant tensor programs for
different operators on various hardware platforms is notoriously challenging.
Currently, deep learning systems rely on vendor-provided kernel libraries or
various search strategies to get performant tensor programs. These approaches
either require significant engineering effort to develop platform-specific
optimization code or fall short of finding high-performance programs due to
restricted search space and ineffective exploration strategy.
We present Ansor, a tensor program generation framework for deep learning
applications. Compared with existing search strategies, Ansor explores many
more optimization combinations by sampling programs from a hierarchical
representation of the search space. Ansor then fine-tunes the sampled programs
with evolutionary search and a learned cost model to identify the best
programs. Ansor can find high-performance programs that are outside the search
space of existing state-of-the-art approaches. In addition, Ansor utilizes a
task scheduler to simultaneously optimize multiple subgraphs in deep neural
networks. We show that Ansor improves the execution performance of deep neural
networks relative to the state-of-the-art on the Intel CPU, ARM CPU, and NVIDIA
GPU by up to $3.8\times$, $2.6\times$, and $1.7\times$, respectively.
Related papers
- Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent [48.791943145735]
We show the potential to reduce Ansor's search time while enhancing kernel quality.
We apply this approach to the first 300 kernels that Ansor generates.
This result has been replicated in 20 well-known deep-learning models.
arXiv Detail & Related papers (2024-06-28T16:34:22Z) - HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
for Neural Networks [51.71682428015139]
We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration.
HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler.
Inference performance and search speed are also significantly improved on end-to-end neural networks.
arXiv Detail & Related papers (2022-11-21T04:15:27Z) - Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor
Programs [11.338285393619042]
We propose to embed the scheduling process into tensor programs and use dedicated mappings, called task mappings, to define the computation assignment and ordering.
With the proposed paradigm, we implement a deep learning compiler - Hidet.
arXiv Detail & Related papers (2022-10-18T05:32:13Z) - Towards making the most of NLP-based device mapping optimization for
OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels.
We propose four different models that provide enhanced contextual information of source codes.
Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z) - CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis.
Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs.
We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Towards High Performance Java-based Deep Learning Frameworks [0.22940141855172028]
Modern cloud services have set the demand for fast and efficient data processing.
This demand is common among numerous application domains, such as deep learning, data mining, and computer vision.
In this paper we have employed TornadoVM, a state-of-the-art programming framework to transparently accelerate Deep Netts; a Java-based deep learning framework.
arXiv Detail & Related papers (2020-01-13T13:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.