Related papers: Cost-effective Machine Learning Inference Offload for Edge Computing

Cost-effective Machine Learning Inference Offload for Edge Computing

URL: http://arxiv.org/abs/2012.04063v1
Date: Mon, 7 Dec 2020 21:11:02 GMT
Title: Cost-effective Machine Learning Inference Offload for Edge Computing
Authors: Christian Makaya, Amalendu Iyer, Jonathan Salfity, Madhu Athreya, M Anthony Lewis
Abstract summary: This paper proposes a novel offloading mechanism by leveraging installed-base on-premises (edge) computational resources. The proposed mechanism allows the edge devices to offload heavy and compute-intensive workloads to edge nodes instead of using remote cloud.
Score: 0.3149883354098941
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Computing at the edge is increasingly important since a massive amount of data is generated. This poses challenges in transporting all that data to the remote data centers and cloud, where they can be processed and analyzed. On the other hand, harnessing the edge data is essential for offering data-driven and machine learning-based applications, if the challenges, such as device capabilities, connectivity, and heterogeneity can be mitigated. Machine learning applications are very compute-intensive and require processing of large amount of data. However, edge devices are often resources-constrained, in terms of compute resources, power, storage, and network connectivity. Hence, limiting their potential to run efficiently and accurately state-of-the art deep neural network (DNN) models, which are becoming larger and more complex. This paper proposes a novel offloading mechanism by leveraging installed-base on-premises (edge) computational resources. The proposed mechanism allows the edge devices to offload heavy and compute-intensive workloads to edge nodes instead of using remote cloud. Our offloading mechanism has been prototyped and tested with state-of-the art person and object detection DNN models for mobile robots and video surveillance applications. The performance shows a significant gain compared to cloud-based offloading strategies in terms of accuracy and latency.

Related papers

Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning [39.73152182572741]
This paper proposes a novel framework, named Quantized Split Federated Fine-Tuning Large AI Model (SFLAM) By partitioning the training load between edge devices and servers, SFLAM can facilitate the operation of large models on devices. SFLAM incorporates quantization management, power control, and bandwidth allocation strategies to enhance training efficiency.
arXiv Detail & Related papers (2025-04-12T07:55:11Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Driving Intelligent IoT Monitoring and Control through Cloud Computing and Machine Learning [3.134387323162717]
This article explores how to drive intelligent iot monitoring and control through cloud computing and machine learning. The paper also introduces the development of iot monitoring and control technology, the application of edge computing in iot monitoring and control, and the role of machine learning in data analysis and fault detection.
arXiv Detail & Related papers (2024-03-26T20:59:48Z)
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems [12.427821850039448]
We propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. Our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices.
arXiv Detail & Related papers (2023-06-22T06:33:12Z)
The MIT Supercloud Workload Classification Challenge [10.458111248130944]
In this paper, we present a workload classification challenge based on the MIT Supercloud dataset. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads.
arXiv Detail & Related papers (2022-04-12T14:28:04Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
EffCNet: An Efficient CondenseNet for Image Classification on NXP BlueBox [0.0]
Edge devices offer limited processing power due to their inexpensive hardware, and limited cooling and computational resources. We propose a novel deep convolutional neural network architecture called EffCNet for edge devices.
arXiv Detail & Related papers (2021-11-28T21:32:31Z)
Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks. In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework. In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z)
Complexity-aware Adaptive Training and Inference for Edge-Cloud Distributed AI Systems [9.273593723275544]
IoT and machine learning applications create large amounts of data that require real-time processing. We propose a distributed AI system to exploit both the edge and the cloud for training and inference.
arXiv Detail & Related papers (2021-09-14T05:03:54Z)
Auto-Split: A General Framework of Collaborative Edge-Cloud AI [49.750972428032355]
This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.
arXiv Detail & Related papers (2021-08-30T08:03:29Z)
Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments. It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.