Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning
Applications on Edge
- URL: http://arxiv.org/abs/2211.07130v1
- Date: Mon, 14 Nov 2022 06:17:32 GMT
- Title: Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning
Applications on Edge
- Authors: SM Zobaed, Ali Mokhtari, Jaya Prakash Champati, Mathieu Kourouma,
Mohsen Amini Salehi
- Abstract summary: This research aims to overcome the memory contention challenge to meet the latency constraints of the Deep Learning applications.
We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory.
We show that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2X and increase the number of warm-starts by around 60% without any major loss on the inference accuracy of the applications.
- Score: 10.067877168224337
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Smart IoT-based systems often desire continuous execution of multiple
latency-sensitive Deep Learning (DL) applications. The edge servers serve as
the cornerstone of such IoT-based systems, however, their resource limitations
hamper the continuous execution of multiple (multi-tenant) DL applications. The
challenge is that, DL applications function based on bulky "neural network (NN)
models" that cannot be simultaneously maintained in the limited memory space of
the edge. Accordingly, the main contribution of this research is to overcome
the memory contention challenge, thereby, meeting the latency constraints of
the DL applications without compromising their inference accuracy. We propose
an efficient NN model management framework, called Edge-MultiAI, that ushers
the NN models of the DL applications into the edge memory such that the degree
of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI
leverages NN model compression techniques, such as model quantization, and
dynamically loads NN models for DL applications to stimulate multi-tenancy on
the edge server. We also devise a model management heuristic for Edge-MultiAI,
called iWS-BFE, that functions based on the Bayesian theory to predict the
inference requests for multi-tenant applications, and uses it to choose the
appropriate NN models for loading, hence, increasing the number of warm-start
inferences. We evaluate the efficacy and robustness of Edge-MultiAI under
various configurations. The results reveal that Edge-MultiAI can stimulate the
degree of multi-tenancy on the edge by at least 2X and increase the number of
warm-starts by around 60% without any major loss on the inference accuracy of
the applications.
Related papers
- DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation [16.44627200990594]
recommender systems start to deploy models on edges to alleviate network congestion caused by frequent mobile requests.
Several studies have leveraged the proximity of edge-side to real-time data, fine-tuning them to create edge-specific models.
These methods require substantial on-edge computational resources and frequent network transfers to keep the model up to date.
We propose a customizeD slImming framework for incompatiblE neTworks(DIET). DIET deploys the same generic backbone (potentially incompatible for a specific edge) to all devices.
arXiv Detail & Related papers (2024-06-13T04:39:16Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - Enhancing Neural Architecture Search with Multiple Hardware Constraints
for Deep Learning Model Deployment on Tiny IoT Devices [17.919425885740793]
We propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods.
We show that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively.
arXiv Detail & Related papers (2023-10-11T06:09:14Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Multi-objective Deep Reinforcement Learning for Mobile Edge Computing [11.966938107719903]
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption.
In this study, we formulate a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay.
We introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption.
arXiv Detail & Related papers (2023-07-05T16:36:42Z) - BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge
Platforms [12.095934624748686]
Deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications.
It is critical for edge inference platforms to have both high-latency and low-latency.
This paper proposes BCEdge, a novel learning-based scheduling framework.
arXiv Detail & Related papers (2023-05-01T02:56:43Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Dynamic Sparsity Neural Networks for Automatic Speech Recognition [44.352231175123215]
We present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time.
Our trained DSNN model, therefore, can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
arXiv Detail & Related papers (2020-05-16T22:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.