Related papers: HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks

HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks

URL: http://arxiv.org/abs/2211.11172v1
Date: Mon, 21 Nov 2022 04:15:27 GMT
Title: HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks
Authors: Zining Zhang, Bingsheng He, Zhenjie Zhang
Abstract summary: We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration. HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler. Inference performance and search speed are also significantly improved on end-to-end neural networks.
Score: 51.71682428015139
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To efficiently perform inference with neural networks, the underlying tensor programs require sufficient tuning efforts before being deployed into production environments. Usually, enormous tensor program candidates need to be sufficiently explored to find the one with the best performance. This is necessary to make the neural network products meet the high demand of real-world applications such as natural language processing, auto-driving, etc. Auto-schedulers are being developed to avoid the need for human intervention. However, due to the gigantic search space and lack of intelligent search guidance, current auto-schedulers require hours to days of tuning time to find the best-performing tensor program for the entire neural network. In this paper, we propose HARL, a reinforcement learning (RL) based auto-scheduler specifically designed for efficient tensor program exploration. HARL uses a hierarchical RL architecture in which learning-based decisions are made at all different levels of search granularity. It also automatically adjusts exploration configurations in real-time for faster performance convergence. As a result, HARL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler. Inference performance and search speed are also significantly improved on end-to-end neural networks.

Related papers

Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning [8.559995591255811]
Recent advancements in reinforcement learning have leveraged neural networks to achieve state-of-the-art performance across various control tasks.<n>These successes often come at the cost of significant computational resources, as training deep neural networks requires substantial time and data.<n>We introduce an actor-critic algorithm that utilizes randomized neural networks to drastically reduce computational costs while maintaining strong performance.
arXiv Detail & Related papers (2025-05-25T09:17:22Z)
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units. It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation [19.009600866053923]
We present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules. Experiments show that TAP is $20times- 160times$ faster than the state-of-the-art automatic parallelism framework.
arXiv Detail & Related papers (2023-02-01T05:22:28Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models [90.6485663020735]
Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks. We propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning.
arXiv Detail & Related papers (2021-08-07T12:03:14Z)
Smart Scheduling based on Deep Reinforcement Learning for Cellular Networks [18.04856086228028]
We propose a smart scheduling scheme based on deep reinforcement learning (DRL) We provide implementation-friend designs, i.e., a scalable neural network design for the agent and a virtual environment training framework. We show that the DRL-based smart scheduling outperforms the conventional scheduling method and can be adopted in practical systems.
arXiv Detail & Related papers (2021-03-22T02:09:16Z)
Superiorities of Deep Extreme Learning Machines against Convolutional Neural Networks [3.04585143845864]
Deep Learning (DL) is a machine learning procedure for artificial intelligence that analyzes the input data in detail. DL has a popularity with the common improvements on the graphical processing unit capabilities. Deep Extreme Learning machines (Deep ELM) is one of the fastest and effective way to meet fast classification problems.
arXiv Detail & Related papers (2021-01-21T08:22:18Z)
Scheduling Real-time Deep Learning Services as Imprecise Computations [11.611969843191433]
The paper presents an efficient real-time scheduling algorithm for intelligent real-time edge services. These services perform machine intelligence tasks, such as voice recognition, LIDAR processing, or machine vision. We show that deep neural network can be cast as imprecise computations, each with a mandatory part and several optional parts.
arXiv Detail & Related papers (2020-11-02T16:43:04Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms [0.0]
We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to determine the learning rate schedule for a selection of popular neural network training algorithms. GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions.
arXiv Detail & Related papers (2020-06-29T08:59:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.