MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge
- URL: http://arxiv.org/abs/2205.12660v1
- Date: Wed, 25 May 2022 11:08:20 GMT
- Title: MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge
- Authors: Saad Abbasi, Alexander Wong, Mohammad Javad Shafiee
- Abstract summary: Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
- Score: 87.41163540910854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network (DNN) latency characterization is a time-consuming
process and adds significant cost to Neural Architecture Search (NAS) processes
when searching for efficient convolutional neural networks for embedded vision
applications. DNN Latency is a hardware dependent metric and requires direct
measurement or inference on target hardware. A recently introduced latency
estimation technique known as MAPLE predicts DNN execution time on previously
unseen hardware devices by using hardware performance counters. Leveraging
these hardware counters in the form of an implicit prior, MAPLE achieves
state-of-the-art performance in latency prediction. Here, we propose MAPLE-X
which extends MAPLE by incorporating explicit prior knowledge of hardware
devices and DNN architecture latency to better account for model stability and
robustness. First, by identifying DNN architectures that exhibit a similar
latency to each other, we can generate multiple virtual examples to
significantly improve the accuracy over MAPLE. Secondly, the hardware
specifications are used to determine the similarity between training and test
hardware to emphasize training samples captured from comparable devices
(domains) and encourages improved domain alignment. Experimental results using
a convolution neural network NAS benchmark across different types of devices,
including an Intel processor that is now used for embedded vision applications,
demonstrate a 5% improvement over MAPLE and 9% over HELP. Furthermore, we
include ablation studies to independently assess the benefits of virtual
examples and hardware-based sample importance.
Related papers
- On Latency Predictors for Neural Architecture Search [8.564763702766776]
We introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets.
We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes.
Building on conclusions from our study, we present an end-to-end latency predictor training strategy.
arXiv Detail & Related papers (2024-03-04T19:59:32Z) - Inference Latency Prediction at the Edge [0.3974789827371669]
State-of-the-art neural architectures (NAs) are typically designed through Neural Architecture Search (NAS) to identify NAs with good tradeoffs between accuracy and efficiency.
Since measuring the latency of a huge set of candidate architectures during NAS is not scalable, approaches are needed for predicting end-to-end inference latency on mobile devices.
We propose a latency prediction framework which addresses these challenges by developing operation-wise latency predictors.
arXiv Detail & Related papers (2022-10-06T00:46:06Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture
Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference.
We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization.
We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption.
Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process.
We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z) - HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via
Meta-Learning [43.751220068642624]
Hardware-adaptive Predictor (HELP) is a device-specific latency estimation problem as a meta-learning problem.
We introduce novel hardware embeddings to embed any devices considering them as black-box functions that output latencies, and meta-learn the hardware-adaptive latency predictor in a device-dependent manner.
We validate the proposed HELP for its latency estimation performance on unseen platforms, on which it achieves high estimation performance with as few as 10 measurement samples, outperforming all relevant baselines.
arXiv Detail & Related papers (2021-06-16T08:36:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.