HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via
Meta-Learning
- URL: http://arxiv.org/abs/2106.08630v1
- Date: Wed, 16 Jun 2021 08:36:21 GMT
- Title: HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via
Meta-Learning
- Authors: Hayeon Lee, Sewoong Lee, Song Chong, Sung Ju Hwang
- Abstract summary: Hardware-adaptive Predictor (HELP) is a device-specific latency estimation problem as a meta-learning problem.
We introduce novel hardware embeddings to embed any devices considering them as black-box functions that output latencies, and meta-learn the hardware-adaptive latency predictor in a device-dependent manner.
We validate the proposed HELP for its latency estimation performance on unseen platforms, on which it achieves high estimation performance with as few as 10 measurement samples, outperforming all relevant baselines.
- Score: 43.751220068642624
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: For deployment, neural architecture search should be hardware-aware, in order
to satisfy the device-specific constraints (e.g., memory usage, latency and
energy consumption) and enhance the model efficiency. Existing methods on
hardware-aware NAS collect a large number of samples (e.g., accuracy and
latency) from a target device, either builds a lookup table or a latency
estimator. However, such approach is impractical in real-world scenarios as
there exist numerous devices with different hardware specifications, and
collecting samples from such a large number of devices will require prohibitive
computational and monetary cost. To overcome such limitations, we propose
Hardware-adaptive Efficient Latency Predictor (HELP), which formulates the
device-specific latency estimation problem as a meta-learning problem, such
that we can estimate the latency of a model's performance for a given task on
an unseen device with a few samples. To this end, we introduce novel hardware
embeddings to embed any devices considering them as black-box functions that
output latencies, and meta-learn the hardware-adaptive latency predictor in a
device-dependent manner, using the hardware embeddings. We validate the
proposed HELP for its latency estimation performance on unseen platforms, on
which it achieves high estimation performance with as few as 10 measurement
samples, outperforming all relevant baselines. We also validate end-to-end NAS
frameworks using HELP against ones without it, and show that it largely reduces
the total time cost of the base NAS method, in latency-constrained settings.
Related papers
- On Latency Predictors for Neural Architecture Search [8.564763702766776]
We introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets.
We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes.
Building on conclusions from our study, we present an end-to-end latency predictor training strategy.
arXiv Detail & Related papers (2024-03-04T19:59:32Z) - Inference Latency Prediction at the Edge [0.3974789827371669]
State-of-the-art neural architectures (NAs) are typically designed through Neural Architecture Search (NAS) to identify NAs with good tradeoffs between accuracy and efficiency.
Since measuring the latency of a huge set of candidate architectures during NAS is not scalable, approaches are needed for predicting end-to-end inference latency on mobile devices.
We propose a latency prediction framework which addresses these challenges by developing operation-wise latency predictors.
arXiv Detail & Related papers (2022-10-06T00:46:06Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture
Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference.
We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization.
We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z) - MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption.
Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process.
We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z) - One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search [21.50120377137633]
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis.
To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial.
We propose an efficient proxy adaptation technique to significantly boost the latency monotonicity.
arXiv Detail & Related papers (2021-11-01T18:56:42Z) - LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud
Networks [73.78551758828294]
LC-NAS is able to find state-of-the-art architectures for point cloud classification with minimal computational cost.
We show how our searched architectures achieve any desired latency with a reasonably low drop in accuracy.
arXiv Detail & Related papers (2020-08-24T10:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.