Related papers: Ansor: Generating High-Performance Tensor Programs for Deep Learning

Ansor: Generating High-Performance Tensor Programs for Deep Learning

URL: http://arxiv.org/abs/2006.06762v5
Date: Sun, 15 Oct 2023 07:00:36 GMT
Title: Ansor: Generating High-Performance Tensor Programs for Deep Learning
Authors: Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica
Abstract summary: We present Ansor, a tensor program generation framework for deep learning applications. Ansor explores many more optimization combinations by sampling programs from a hierarchical representation of the search space. Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches.
Score: 45.437816016043534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require significant engineering effort to develop platform-specific optimization code or fall short of finding high-performance programs due to restricted search space and ineffective exploration strategy. We present Ansor, a tensor program generation framework for deep learning applications. Compared with existing search strategies, Ansor explores many more optimization combinations by sampling programs from a hierarchical representation of the search space. Ansor then fine-tunes the sampled programs with evolutionary search and a learned cost model to identify the best programs. Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches. In addition, Ansor utilizes a task scheduler to simultaneously optimize multiple subgraphs in deep neural networks. We show that Ansor improves the execution performance of deep neural networks relative to the state-of-the-art on the Intel CPU, ARM CPU, and NVIDIA GPU by up to $3.8\times$, $2.6\times$, and $1.7\times$, respectively.

Related papers

Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent [48.791943145735]
We show the potential to reduce Ansor's search time while enhancing kernel quality. We apply this approach to the first 300 kernels that Ansor generates. This result has been replicated in 20 well-known deep-learning models.
arXiv Detail & Related papers (2024-06-28T16:34:22Z)
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks [51.71682428015139]
We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration. HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler. Inference performance and search speed are also significantly improved on end-to-end neural networks.
arXiv Detail & Related papers (2022-11-21T04:15:27Z)
Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [11.338285393619042]
We propose to embed the scheduling process into tensor programs and use dedicated mappings, called task mappings, to define the computation assignment and ordering. With the proposed paradigm, we implement a deep learning compiler - Hidet.
arXiv Detail & Related papers (2022-10-18T05:32:13Z)
Towards making the most of NLP-based device mapping optimization for OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels. We propose four different models that provide enhanced contextual information of source codes. Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z)
CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis. Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs. We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Towards High Performance Java-based Deep Learning Frameworks [0.22940141855172028]
Modern cloud services have set the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In this paper we have employed TornadoVM, a state-of-the-art programming framework to transparently accelerate Deep Netts; a Java-based deep learning framework.
arXiv Detail & Related papers (2020-01-13T13:03:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.