HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
for Neural Networks
- URL: http://arxiv.org/abs/2211.11172v1
- Date: Mon, 21 Nov 2022 04:15:27 GMT
- Title: HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
for Neural Networks
- Authors: Zining Zhang, Bingsheng He, Zhenjie Zhang
- Abstract summary: We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration.
HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler.
Inference performance and search speed are also significantly improved on end-to-end neural networks.
- Score: 51.71682428015139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To efficiently perform inference with neural networks, the underlying tensor
programs require sufficient tuning efforts before being deployed into
production environments. Usually, enormous tensor program candidates need to be
sufficiently explored to find the one with the best performance. This is
necessary to make the neural network products meet the high demand of
real-world applications such as natural language processing, auto-driving, etc.
Auto-schedulers are being developed to avoid the need for human intervention.
However, due to the gigantic search space and lack of intelligent search
guidance, current auto-schedulers require hours to days of tuning time to find
the best-performing tensor program for the entire neural network.
In this paper, we propose HARL, a reinforcement learning (RL) based
auto-scheduler specifically designed for efficient tensor program exploration.
HARL uses a hierarchical RL architecture in which learning-based decisions are
made at all different levels of search granularity. It also automatically
adjusts exploration configurations in real-time for faster performance
convergence. As a result, HARL improves the tensor operator performance by 22%
and the search speed by 4.3x compared to the state-of-the-art auto-scheduler.
Inference performance and search speed are also significantly improved on
end-to-end neural networks.
Related papers
- TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
Parallelisation [19.009600866053923]
We present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules.
Experiments show that TAP is $20times- 160times$ faster than the state-of-the-art automatic parallelism framework.
arXiv Detail & Related papers (2023-02-01T05:22:28Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of
Models [90.6485663020735]
Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.
We propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning.
arXiv Detail & Related papers (2021-08-07T12:03:14Z) - Smart Scheduling based on Deep Reinforcement Learning for Cellular
Networks [18.04856086228028]
We propose a smart scheduling scheme based on deep reinforcement learning (DRL)
We provide implementation-friend designs, i.e., a scalable neural network design for the agent and a virtual environment training framework.
We show that the DRL-based smart scheduling outperforms the conventional scheduling method and can be adopted in practical systems.
arXiv Detail & Related papers (2021-03-22T02:09:16Z) - Superiorities of Deep Extreme Learning Machines against Convolutional
Neural Networks [3.04585143845864]
Deep Learning (DL) is a machine learning procedure for artificial intelligence that analyzes the input data in detail.
DL has a popularity with the common improvements on the graphical processing unit capabilities.
Deep Extreme Learning machines (Deep ELM) is one of the fastest and effective way to meet fast classification problems.
arXiv Detail & Related papers (2021-01-21T08:22:18Z) - Scheduling Real-time Deep Learning Services as Imprecise Computations [11.611969843191433]
The paper presents an efficient real-time scheduling algorithm for intelligent real-time edge services.
These services perform machine intelligence tasks, such as voice recognition, LIDAR processing, or machine vision.
We show that deep neural network can be cast as imprecise computations, each with a mandatory part and several optional parts.
arXiv Detail & Related papers (2020-11-02T16:43:04Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Gradient-only line searches to automatically determine learning rates
for a variety of stochastic training algorithms [0.0]
We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to determine the learning rate schedule for a selection of popular neural network training algorithms.
GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions.
arXiv Detail & Related papers (2020-06-29T08:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.