Exploring the Impact of Virtualization on the Usability of the Deep
Learning Applications
- URL: http://arxiv.org/abs/2112.09780v1
- Date: Fri, 17 Dec 2021 21:51:34 GMT
- Title: Exploring the Impact of Virtualization on the Usability of the Deep
Learning Applications
- Authors: Davood G. Samani, Mohsen Amini Salehi
- Abstract summary: This study measures the impact of four popular execution platforms on the E2E inference time of four types of Deep Learning applications.
The notable finding is that the solution architects must be aware of the DL application characteristics.
- Score: 1.527276935569975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Learning-based (DL) applications are becoming increasingly popular and
advancing at an unprecedented pace. While many research works are being
undertaken to enhance Deep Neural Networks (DNN) -- the centerpiece of DL
applications -- practical deployment challenges of these applications in the
Cloud and Edge systems, and their impact on the usability of the applications
have not been sufficiently investigated. In particular, the impact of deploying
different virtualization platforms, offered by the Cloud and Edge, on the
usability of DL applications (in terms of the End-to-End (E2E) inference time)
has remained an open question. Importantly, resource elasticity (by means of
scale-up), CPU pinning, and processor type (CPU vs GPU) configurations have
shown to be influential on the virtualization overhead. Accordingly, the goal
of this research is to study the impact of these potentially decisive
deployment options on the E2E performance, thus, usability of the DL
applications. To that end, we measure the impact of four popular execution
platforms (namely, bare-metal, virtual machine (VM), container, and container
in VM) on the E2E inference time of four types of DL applications, upon
changing processor configuration (scale-up, CPU pinning) and processor types.
This study reveals a set of interesting and sometimes counter-intuitive
findings that can be used as best practices by Cloud solution architects to
efficiently deploy DL applications in various systems. The notable finding is
that the solution architects must be aware of the DL application
characteristics, particularly, their pre- and post-processing requirements, to
be able to optimally choose and configure an execution platform, determine the
use of GPU, and decide the efficient scale-up range.
Related papers
- Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls [22.49750818224266]
A growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications.
Mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors.
This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors.
arXiv Detail & Related papers (2024-05-03T04:47:23Z) - Green AI: A Preliminary Empirical Study on Energy Consumption in DL
Models Across Different Runtime Infrastructures [56.200335252600354]
It is common practice to deploy pre-trained models on environments distinct from their native development settings.
This led to the introduction of interchange formats such as ONNX, which includes its infrastructure, and ONNX, which work as standard formats.
arXiv Detail & Related papers (2024-02-21T09:18:44Z) - etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices [47.365775210055396]
We propose ETuner, an efficient edge continual learning framework that optimize inference accuracy, fine-tuning execution time, and energy efficiency.
Experimental results show that, on average, ETuner reduces overall fine-tuning execution time by 64%, energy consumption by 56%, and improves average inference accuracy by 1.75% over the immediate model fine-tuning approach.
arXiv Detail & Related papers (2024-01-30T02:41:05Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning
Applications on Edge [10.067877168224337]
This research aims to overcome the memory contention challenge to meet the latency constraints of the Deep Learning applications.
We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory.
We show that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.
arXiv Detail & Related papers (2022-11-14T06:17:32Z) - Heterogeneous Data-Centric Architectures for Modern Data-Intensive
Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications.
In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Reproducible Performance Optimization of Complex Applications on the
Edge-to-Cloud Continuum [55.6313942302582]
We propose a methodology to support the optimization of real-life applications on the Edge-to-Cloud Continuum.
Our approach relies on a rigorous analysis of possible configurations in a controlled testbed environment to understand their behaviour.
Our methodology can be generalized to other applications in the Edge-to-Cloud Continuum.
arXiv Detail & Related papers (2021-08-04T07:35:14Z) - OODIn: An Optimised On-Device Inference Framework for Heterogeneous
Mobile Devices [5.522962791793502]
OODIn is a framework for the optimised deployment of deep learning apps across heterogeneous mobile devices.
It counteracts the variability in device resources and DL models by means of a highly parametrised multi-layer design.
It delivers up to 4.3x and 3.5x performance gain over highly optimised platform- and model-aware designs.
arXiv Detail & Related papers (2021-06-08T22:38:18Z) - Optimising Resource Management for Embedded Machine Learning [23.00896228073755]
Machine learning inference is increasingly being executed locally on mobile and embedded platforms.
We show approaches for online resource management in heterogeneous multi-core systems.
arXiv Detail & Related papers (2021-05-08T06:10:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.