LETI: Latency Estimation Tool and Investigation of Neural Networks
inference on Mobile GPU
- URL: http://arxiv.org/abs/2010.02871v2
- Date: Tue, 27 Jul 2021 17:27:45 GMT
- Title: LETI: Latency Estimation Tool and Investigation of Neural Networks
inference on Mobile GPU
- Authors: Evgeny Ponomarev and Sergey Matveev and Ivan Oseledets
- Abstract summary: In this work, we consider latency approximation on mobile GPU as a data and hardware-specific problem.
We build open-source tools which provide a convenient way to conduct massive experiments on different target devices.
We experimentally demonstrate the applicability of such an approach on a subset of popular NAS-Benchmark 101 dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A lot of deep learning applications are desired to be run on mobile devices.
Both accuracy and inference time are meaningful for a lot of them. While the
number of FLOPs is usually used as a proxy for neural network latency, it may
be not the best choice. In order to obtain a better approximation of latency,
research community uses look-up tables of all possible layers for latency
calculation for the final prediction of the inference on mobile CPU. It
requires only a small number of experiments. Unfortunately, on mobile GPU this
method is not applicable in a straight-forward way and shows low precision. In
this work, we consider latency approximation on mobile GPU as a data and
hardware-specific problem. Our main goal is to construct a convenient latency
estimation tool for investigation(LETI) of neural network inference and
building robust and accurate latency prediction models for each specific task.
To achieve this goal, we build open-source tools which provide a convenient way
to conduct massive experiments on different target devices focusing on mobile
GPU. After evaluation of the dataset, we learn the regression model on
experimental data and use it for future latency prediction and analysis. We
experimentally demonstrate the applicability of such an approach on a subset of
popular NAS-Benchmark 101 dataset and also evaluate the most popular neural
network architectures for two mobile GPUs. As a result, we construct latency
prediction model with good precision on the target evaluation subset. We
consider LETI as a useful tool for neural architecture search or massive
latency evaluation. The project is available at https://github.com/leti-ai
Related papers
- On Latency Predictors for Neural Architecture Search [8.564763702766776]
We introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets.
We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes.
Building on conclusions from our study, we present an end-to-end latency predictor training strategy.
arXiv Detail & Related papers (2024-03-04T19:59:32Z) - PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep
Learning Models on Edge Devices [8.272409756443539]
This paper describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNNlite graph.
Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of 5% across all targets and model search spaces.
arXiv Detail & Related papers (2023-01-26T08:59:15Z) - Tech Report: One-stage Lightweight Object Detectors [0.38073142980733]
This work is for designing one-stage lightweight detectors which perform well in terms of mAP and latency.
With baseline models each of which targets on GPU and CPU respectively, various operations are applied instead of the main operations in backbone networks of baseline models.
arXiv Detail & Related papers (2022-10-31T09:02:37Z) - Inference Latency Prediction at the Edge [0.3974789827371669]
State-of-the-art neural architectures (NAs) are typically designed through Neural Architecture Search (NAS) to identify NAs with good tradeoffs between accuracy and efficiency.
Since measuring the latency of a huge set of candidate architectures during NAS is not scalable, approaches are needed for predicting end-to-end inference latency on mobile devices.
We propose a latency prediction framework which addresses these challenges by developing operation-wise latency predictors.
arXiv Detail & Related papers (2022-10-06T00:46:06Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption.
Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process.
We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud
Networks [73.78551758828294]
LC-NAS is able to find state-of-the-art architectures for point cloud classification with minimal computational cost.
We show how our searched architectures achieve any desired latency with a reasonably low drop in accuracy.
arXiv Detail & Related papers (2020-08-24T10:30:21Z) - Latency-Aware Differentiable Neural Architecture Search [113.35689580508343]
Differentiable neural architecture search methods became popular in recent years, mainly due to their low search costs and flexibility in designing the search space.
However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.
This paper deals with this problem by adding a differentiable latency loss term into optimization, so that the search process can tradeoff between accuracy and latency with a balancing coefficient.
arXiv Detail & Related papers (2020-01-17T15:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.