On Latency Predictors for Neural Architecture Search
- URL: http://arxiv.org/abs/2403.02446v1
- Date: Mon, 4 Mar 2024 19:59:32 GMT
- Title: On Latency Predictors for Neural Architecture Search
- Authors: Yash Akhauri, Mohamed S. Abdelfattah
- Abstract summary: We introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets.
We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes.
Building on conclusions from our study, we present an end-to-end latency predictor training strategy.
- Score: 8.564763702766776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient deployment of neural networks (NN) requires the co-optimization of
accuracy and latency. For example, hardware-aware neural architecture search
has been used to automatically find NN architectures that satisfy a latency
constraint on a specific hardware device. Central to these search algorithms is
a prediction model that is designed to provide a hardware latency estimate for
a candidate NN architecture. Recent research has shown that the sample
efficiency of these predictive models can be greatly improved through
pre-training on some \textit{training} devices with many samples, and then
transferring the predictor on the \textit{test} (target) device. Transfer
learning and meta-learning methods have been used for this, but often exhibit
significant performance variability. Additionally, the evaluation of existing
latency predictors has been largely done on hand-crafted training/test device
sets, making it difficult to ascertain design features that compose a robust
and general latency predictor. To address these issues, we introduce a
comprehensive suite of latency prediction tasks obtained in a principled way
through automated partitioning of hardware device sets. We then design a
general latency predictor to comprehensively study (1) the predictor
architecture, (2) NN sample selection methods, (3) hardware device
representations, and (4) NN operation encoding schemes. Building on conclusions
from our study, we present an end-to-end latency predictor training strategy
that outperforms existing methods on 11 out of 12 difficult latency prediction
tasks, improving latency prediction by 22.5\% on average, and up to to 87.6\%
on the hardest tasks. Focusing on latency prediction, our HW-Aware NAS reports
a $5.8\times$ speedup in wall-clock time. Our code is available on
\href{https://github.com/abdelfattah-lab/nasflat_latency}{https://github.com/abdelfattah-lab/nasflat\_latency}.
Related papers
- Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Multi-Predict: Few Shot Predictors For Efficient Neural Architecture
Search [10.538869116366415]
We introduce a novel search-space independent NN encoding based on zero-cost proxies that achieves sample-efficient prediction on multiple tasks and NAS search spaces.
Our NN encoding enables multi-search-space transfer of latency predictors from NASBench-201 to FBNet in under 85 HW measurements.
arXiv Detail & Related papers (2023-06-04T20:22:14Z) - Inference Latency Prediction at the Edge [0.3974789827371669]
State-of-the-art neural architectures (NAs) are typically designed through Neural Architecture Search (NAS) to identify NAs with good tradeoffs between accuracy and efficiency.
Since measuring the latency of a huge set of candidate architectures during NAS is not scalable, approaches are needed for predicting end-to-end inference latency on mobile devices.
We propose a latency prediction framework which addresses these challenges by developing operation-wise latency predictors.
arXiv Detail & Related papers (2022-10-06T00:46:06Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption.
Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process.
We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z) - Generalized Latency Performance Estimation for Once-For-All Neural
Architecture Search [0.0]
We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space.
We provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to ProxylessNAS.
arXiv Detail & Related papers (2021-01-04T00:48:09Z) - LETI: Latency Estimation Tool and Investigation of Neural Networks
inference on Mobile GPU [0.0]
In this work, we consider latency approximation on mobile GPU as a data and hardware-specific problem.
We build open-source tools which provide a convenient way to conduct massive experiments on different target devices.
We experimentally demonstrate the applicability of such an approach on a subset of popular NAS-Benchmark 101 dataset.
arXiv Detail & Related papers (2020-10-06T16:51:35Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Latency-Aware Differentiable Neural Architecture Search [113.35689580508343]
Differentiable neural architecture search methods became popular in recent years, mainly due to their low search costs and flexibility in designing the search space.
However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.
This paper deals with this problem by adding a differentiable latency loss term into optimization, so that the search process can tradeoff between accuracy and latency with a balancing coefficient.
arXiv Detail & Related papers (2020-01-17T15:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.