Related papers: Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications

Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications

URL: http://arxiv.org/abs/2112.09780v1
Date: Fri, 17 Dec 2021 21:51:34 GMT
Title: Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications
Authors: Davood G. Samani, Mohsen Amini Salehi
Abstract summary: This study measures the impact of four popular execution platforms on the E2E inference time of four types of Deep Learning applications. The notable finding is that the solution architects must be aware of the DL application characteristics.
Score: 1.527276935569975
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Learning-based (DL) applications are becoming increasingly popular and advancing at an unprecedented pace. While many research works are being undertaken to enhance Deep Neural Networks (DNN) -- the centerpiece of DL applications -- practical deployment challenges of these applications in the Cloud and Edge systems, and their impact on the usability of the applications have not been sufficiently investigated. In particular, the impact of deploying different virtualization platforms, offered by the Cloud and Edge, on the usability of DL applications (in terms of the End-to-End (E2E) inference time) has remained an open question. Importantly, resource elasticity (by means of scale-up), CPU pinning, and processor type (CPU vs GPU) configurations have shown to be influential on the virtualization overhead. Accordingly, the goal of this research is to study the impact of these potentially decisive deployment options on the E2E performance, thus, usability of the DL applications. To that end, we measure the impact of four popular execution platforms (namely, bare-metal, virtual machine (VM), container, and container in VM) on the E2E inference time of four types of DL applications, upon changing processor configuration (scale-up, CPU pinning) and processor types. This study reveals a set of interesting and sometimes counter-intuitive findings that can be used as best practices by Cloud solution architects to efficiently deploy DL applications in various systems. The notable finding is that the solution architects must be aware of the DL application characteristics, particularly, their pre- and post-processing requirements, to be able to optimally choose and configure an execution platform, determine the use of GPU, and decide the efficient scale-up range.

Related papers

Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications [22.053978157017877]
We present methods and insights for training small language models (SLMs) that deliver high performance and efficiency in deployment. We focus on two key techniques: (1) knowledge distillation and (2) model compression via quantization and pruning. We detail the impact of these techniques on a variety of use cases at a large professional social network platform and share deployment lessons.
arXiv Detail & Related papers (2025-02-20T06:40:12Z)
Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls [22.49750818224266]
A growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors.
arXiv Detail & Related papers (2024-05-03T04:47:23Z)
Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures [56.200335252600354]
It is common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats such as ONNX, which includes its infrastructure, and ONNX, which work as standard formats.
arXiv Detail & Related papers (2024-02-21T09:18:44Z)
etuner: A Redundancy-Aware Framework for Efficient Continual Learning Application on Edge Devices [47.365775210055396]
We propose ETuner, an efficient edge continual learning framework that optimize inference accuracy, fine-tuning execution time, and energy efficiency. Experimental results show that, on average, ETuner reduces overall fine-tuning execution time by 64%, energy consumption by 56%, and improves average inference accuracy by 1.75% over the immediate model fine-tuning approach.
arXiv Detail & Related papers (2024-01-30T02:41:05Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge [10.067877168224337]
This research aims to overcome the memory contention challenge to meet the latency constraints of the Deep Learning applications. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory. We show that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.
arXiv Detail & Related papers (2022-11-14T06:17:32Z)
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications. In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum [55.6313942302582]
We propose a methodology to support the optimization of real-life applications on the Edge-to-Cloud Continuum. Our approach relies on a rigorous analysis of possible configurations in a controlled testbed environment to understand their behaviour. Our methodology can be generalized to other applications in the Edge-to-Cloud Continuum.
arXiv Detail & Related papers (2021-08-04T07:35:14Z)
OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices [5.522962791793502]
OODIn is a framework for the optimised deployment of deep learning apps across heterogeneous mobile devices. It counteracts the variability in device resources and DL models by means of a highly parametrised multi-layer design. It delivers up to 4.3x and 3.5x performance gain over highly optimised platform- and model-aware designs.
arXiv Detail & Related papers (2021-06-08T22:38:18Z)
Optimising Resource Management for Embedded Machine Learning [23.00896228073755]
Machine learning inference is increasingly being executed locally on mobile and embedded platforms. We show approaches for online resource management in heterogeneous multi-core systems.
arXiv Detail & Related papers (2021-05-08T06:10:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.