Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to
Infer Hardware Performances
- URL: http://arxiv.org/abs/2205.04586v1
- Date: Mon, 9 May 2022 22:48:39 GMT
- Title: Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to
Infer Hardware Performances
- Authors: Ian Frederick Vigogne Goodbody Hunter, Alessandro Palla, Sebastian
Eusebiu Nagy, Richard Richmond and Kyle McAdoo
- Abstract summary: 'VPUNN' is a neural network-based cost model trained on low-level task profiling.
It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
- Score: 58.720142291102135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Calculating the most efficient schedule of work in a neural network compiler
is a difficult task. There are many parameters to be accounted for that can
positively or adversely affect that schedule depending on their configuration -
How work is shared between distributed targets, the subdivision of tensors to
fit in memory, toggling the enablement of optimizations, etc. Traditionally,
neural network compilers determine how to set these values by building a graph
of choices and choosing the path with minimal 'cost'. These choices and their
corresponding costs are usually determined by an algorithm crafted by engineers
with a deep knowledge of the target platform. However, when the amount of
options available to a compiler is large, it is very difficult to ensure that
these models consistently produce an optimal schedule for all scenarios, whilst
still completing compilation in an acceptable timeframe. This paper presents
'VPUNN' - a neural network-based cost model trained on low-level task profiling
that consistently outperforms the state-of-the-art cost modeling in Intel's
line of VPU processors.
Related papers
- Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral
Edge TPUs [12.952987240366781]
This work presents a reinforcement learning (RL) based scheduling framework, which learns the behaviors of optimal optimization algorithms.
RL generates near-optimal scheduling results with short solving runtime overhead.
Our framework has demonstrated up to $sim2.5times$ real-world on-chip runtime inference speedups over the commercial compiler.
arXiv Detail & Related papers (2023-04-10T17:22:12Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures.
Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Optimising the Performance of Convolutional Neural Networks across
Computing Systems using Transfer Learning [0.08594140167290096]
We propose to replace a lengthy profiling stage with a machine learning based approach of performance modeling.
After training, our performance model can estimate the performance of convolutional primitives in any layer configuration.
The time to optimise the execution of large neural networks via primitive selection is reduced from hours to just seconds.
arXiv Detail & Related papers (2020-10-20T20:58:27Z) - A Learned Performance Model for Tensor Processing Units [5.733911161090224]
We demonstrate a method of learning performance models from a corpus of graph programs for Processing Unit (TPU) instances.
We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks.
It helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.
arXiv Detail & Related papers (2020-08-03T17:24:52Z) - Towards High Performance, Portability, and Productivity: Lightweight
Augmented Neural Networks for Performance Prediction [0.0]
We propose lightweight augmented neural networks for arbitrary combinations of kernel-variant- hardware.
We are able to obtain a low MAPE of 3%, significantly outperforming traditional feed-forward neural networks.
Our variant-selection approach can be used in Halide implementations to obtain up to 1.7x speedup over Halide's auto-scheduler.
arXiv Detail & Related papers (2020-03-17T02:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.