CDMPP: A Device-Model Agnostic Framework for Latency Prediction of
Tensor Programs
- URL: http://arxiv.org/abs/2311.09690v2
- Date: Fri, 17 Nov 2023 08:23:11 GMT
- Title: CDMPP: A Device-Model Agnostic Framework for Latency Prediction of
Tensor Programs
- Authors: Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin
Lin, Chuan Wu
- Abstract summary: Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications.
Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks.
We propose CDMPP, an efficient tensor program latency prediction framework for both cross-model and cross-device prediction.
- Score: 11.025071880642974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) have shown excellent performance in a wide range
of machine learning applications. Knowing the latency of running a DNN model or
tensor program on a specific device is useful in various tasks, such as DNN
graph- or tensor-level optimization and device selection. Considering the large
space of DNN models and devices that impede direct profiling of all
combinations, recent efforts focus on building a predictor to model the
performance of DNN models on different devices. However, none of the existing
attempts have achieved a cost model that can accurately predict the performance
of various tensor programs while supporting both training and inference
accelerators. We propose CDMPP, an efficient tensor program latency prediction
framework for both cross-model and cross-device prediction. We design an
informative but efficient representation of tensor programs, called compact
ASTs, and a pre-order-based positional encoding method, to capture the internal
structure of tensor programs. We develop a domain-adaption-inspired method to
learn domain-invariant representations and devise a KMeans-based sampling
algorithm, for the predictor to learn from different domains (i.e., different
DNN operators and devices). Our extensive experiments on a diverse range of DNN
models and devices demonstrate that CDMPP significantly outperforms
state-of-the-art baselines with 14.03% and 10.85% prediction error for
cross-model and cross-device prediction, respectively, and one order of
magnitude higher training efficiency. The implementation and the expanded
dataset are available at https://github.com/joapolarbear/cdmpp.
Related papers
- Few-Shot Testing: Estimating Uncertainty of Memristive Deep Neural Networks Using One Bayesian Test Vector [0.0]
We propose a test vector generation framework that can estimate the model uncertainty of NNs implemented on memristor-based CIM hardware.
Our method is evaluated on different model dimensions, tasks, fault rates, and variation noise to show that it can consistently achieve $100%$ coverage with only $0.024$ MB of memory overhead.
arXiv Detail & Related papers (2024-05-29T08:53:16Z) - Anole: Adapting Diverse Compressed Models For Cross-Scene Prediction On Mobile Devices [17.542012577533015]
Anole is a light-weight scheme to cope with the local DNN model inference on mobile devices.
We implement Anole on different types of mobile devices and conduct extensive trace-driven and real-world experiments based on unmanned aerial vehicles (UAVs)
arXiv Detail & Related papers (2024-05-09T12:06:18Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep
Learning Models on Edge Devices [8.272409756443539]
This paper describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNNlite graph.
Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of 5% across all targets and model search spaces.
arXiv Detail & Related papers (2023-01-26T08:59:15Z) - Towards a learning-based performance modeling for accelerating Deep
Neural Networks [1.1549572298362785]
We start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs)
Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
arXiv Detail & Related papers (2022-12-09T18:28:07Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.