Analyzing the Performance of Smart Industry 4.0 Applications on Cloud
Computing Systems
- URL: http://arxiv.org/abs/2012.06054v1
- Date: Fri, 11 Dec 2020 00:18:05 GMT
- Title: Analyzing the Performance of Smart Industry 4.0 Applications on Cloud
Computing Systems
- Authors: Razin Farhan Hussain, Alireza Pakravan, Mohsen Amini Salehi
- Abstract summary: Cloud-based Deep Neural Network (DNN) applications are becoming an indispensable part of Industry 4.0.
Suchity, if not captured, can potentially lead to low Quality of Service (QoS) or even a disaster in critical sectors, such as Oil and Gas industry.
This study provides a descriptive analysis of the inference time from two perspectives.
- Score: 1.292804228022353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cloud-based Deep Neural Network (DNN) applications that make
latency-sensitive inference are becoming an indispensable part of Industry 4.0.
Due to the multi-tenancy and resource heterogeneity, both inherent to the cloud
computing environments, the inference time of DNN-based applications are
stochastic. Such stochasticity, if not captured, can potentially lead to low
Quality of Service (QoS) or even a disaster in critical sectors, such as Oil
and Gas industry. To make Industry 4.0 robust, solution architects and
researchers need to understand the behavior of DNN-based applications and
capture the stochasticity exists in their inference times. Accordingly, in this
study, we provide a descriptive analysis of the inference time from two
perspectives. First, we perform an application-centric analysis and
statistically model the execution time of four categorically different DNN
applications on both Amazon and Chameleon clouds. Second, we take a
resource-centric approach and analyze a rate-based metric in form of Million
Instruction Per Second (MIPS) for heterogeneous machines in the cloud. This
non-parametric modeling, achieved via Jackknife and Bootstrap re-sampling
methods, provides the confidence interval of MIPS for heterogeneous cloud
machines. The findings of this research can be helpful for researchers and
cloud solution architects to develop solutions that are robust against the
stochastic nature of the inference time of DNN applications in the cloud and
can offer a higher QoS to their users and avoid unintended outcomes.
Related papers
- Alioth: A Machine Learning Based Interference-Aware Performance Monitor
for Multi-Tenancy Applications in Public Cloud [15.942285615596566]
Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation.
We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications.
Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage.
arXiv Detail & Related papers (2023-07-18T03:34:33Z) - Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT
Services [23.408109000977987]
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures.
We present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources.
Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.
arXiv Detail & Related papers (2023-07-04T12:30:01Z) - Spatial-SpinDrop: Spatial Dropout-based Binary Bayesian Neural Network
with Spintronics Implementation [1.3603499630771996]
We introduce MC-SpatialDropout, a spatial dropout-based approximate BayNNs with spintronics emerging devices.
The number of dropout modules per network layer is reduced by a factor of $9times$ and energy consumption by a factor of $94.11times$, while still achieving comparable predictive performance and uncertainty estimates.
arXiv Detail & Related papers (2023-06-16T21:38:13Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - Auto-Split: A General Framework of Collaborative Edge-Cloud AI [49.750972428032355]
This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud.
To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.
arXiv Detail & Related papers (2021-08-30T08:03:29Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - SPINN: Synergistic Progressive Inference of Neural Networks over Device
and Cloud [13.315410752311768]
A popular alternative comprises offloading CNN processing to powerful cloud-based servers.
SPINN is a distributed inference system that employs synergistic device-cloud together with a progressive inference method.
It provides robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.
arXiv Detail & Related papers (2020-08-14T15:00:19Z) - Trust-Based Cloud Machine Learning Model Selection For Industrial IoT
and Smart City Services [5.333802479607541]
We consider the paradigm where cloud service providers collect big data from resource-constrained devices for building Machine Learning prediction models.
Our proposed solution comprises an intelligent-time reconfiguration that maximizes the level of trust of ML models.
Our results show that the selected model's trust level is 0.7% to 2.53% less compared to the results obtained using ILP.
arXiv Detail & Related papers (2020-08-11T23:58:03Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.