Related papers: Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach

Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach

URL: http://arxiv.org/abs/2410.05338v1
Date: Sun, 6 Oct 2024 20:14:27 GMT
Title: Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach
Authors: Divya Jyoti Bajpai, Manjesh Kumar Hanawal,
Abstract summary: Deep Neural Networks (DNNs) have demonstrated outstanding performance across various domains. A distributed inference setup can be used where a small-sized DNN can be deployed on mobile, a bigger version on the edge, and the full-fledged, on the cloud. We develop a novel approach that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs.
Score: 5.402030962296633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Deep Neural Networks (DNNs) have demonstrated outstanding performance across various domains. However, their large size is a challenge for deployment on resource-constrained devices such as mobile, edge, and IoT platforms. To overcome this, a distributed inference setup can be used where a small-sized DNN (initial few layers) can be deployed on mobile, a bigger version on the edge, and the full-fledged, on the cloud. A sample that has low complexity (easy) could be then inferred on mobile, that has moderate complexity (medium) on edge, and higher complexity (hard) on the cloud. As the complexity of each sample is not known beforehand, the following question arises in distributed inference: how to decide complexity so that it is processed by enough layers of DNNs. We develop a novel approach named DIMEE that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs. DIMEE aims to improve the accuracy, taking into account the offloading cost from mobile to edge/cloud. Experimental validation on GLUE datasets, encompassing various NLP tasks, shows that our method significantly reduces the inference cost (> 43%) while maintaining a minimal drop in accuracy (< 0.3%) compared to the case where all the inference is made in cloud.

Related papers

A Survey of Early Exit Deep Neural Networks in NLP [5.402030962296633]
Deep Neural Networks (DNNs) have grown increasingly large in size to achieve state of the art performance across a wide range of tasks. High computational requirements make them less suitable for resource-constrained applications. Early exit strategies offer a promising solution by enabling adaptive inference.
arXiv Detail & Related papers (2025-01-13T20:08:52Z)
Distributed Inference on Mobile Edge and Cloud: A Data-Cartography based Clustering Approach [5.402030962296633]
A distributed inference framework can be deployed on mobile devices, edge devices, and the full DNN on the cloud. We introduce a novel method named our, which employs data cartography to assess sample complexity. Our approach significantly lowers inference costs by more than 43% while maintaining a minimal accuracy drop of less than 0.5% compared to performing all inferences on the cloud.
arXiv Detail & Related papers (2024-12-21T13:20:26Z)
MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via Automating Deep Neural Network Porting for Mobile Deployment [54.77943671991863]
MatchNAS is a novel scheme for porting Deep Neural Networks to mobile devices. We optimise a large network family using both labelled and unlabelled data. We then automatically search for tailored networks for different hardware platforms.
arXiv Detail & Related papers (2024-02-21T04:43:12Z)
I-SplitEE: Image classification in Split Computing DNNs with Early Exits [5.402030962296633]
Large size of Deep Neural Networks (DNNs) hinders deploying them on resource-constrained devices like edge, mobile, and IoT platforms. Our work presents an innovative unified approach merging early exits and split computing. I-SplitEE is an online unsupervised algorithm ideal for scenarios lacking ground truths and with sequential data.
arXiv Detail & Related papers (2024-01-19T07:44:32Z)
Streaming Video Analytics On The Edge With Asynchronous Cloud Support [2.7456483236562437]
We propose a novel edge-cloud fusion algorithm that fuses edge and cloud predictions, achieving low latency and high accuracy. We focus on object detection in videos (applicable in many video analytics scenarios) and show that the fused edge-cloud predictions can outperform the accuracy of edge-only and cloud-only scenarios by as much as 50%.
arXiv Detail & Related papers (2022-10-04T06:22:13Z)
Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge [28.61344039233783]
We propose to facilitate the application of deep neural networks (DNNs) on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined em edge ensembles, is based on having diverse predictors at each device, which form an ensemble of models during inference. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network.
arXiv Detail & Related papers (2022-06-07T10:24:20Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Complexity-aware Adaptive Training and Inference for Edge-Cloud Distributed AI Systems [9.273593723275544]
IoT and machine learning applications create large amounts of data that require real-time processing. We propose a distributed AI system to exploit both the edge and the cloud for training and inference.
arXiv Detail & Related papers (2021-09-14T05:03:54Z)
Data-Driven Low-Rank Neural Network Compression [8.025818540338518]
We propose a Data-Driven Low-rank (DDLR) method to reduce the number of parameters of pretrained Deep Neural Networks (DNNs) We show that it is possible to significantly reduce the number of parameters with only a small reduction in classification accuracy.
arXiv Detail & Related papers (2021-07-13T00:10:21Z)
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Contextual-Bandit Anomaly Detection for IoT Data in Distributed Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay. We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems. We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
Adaptive Anomaly Detection for IoT Data in Hierarchical Edge Computing [71.86955275376604]
We propose an adaptive anomaly detection approach for hierarchical edge computing (HEC) systems to solve this problem. We design an adaptive scheme to select one of the models based on the contextual information extracted from input data, to perform anomaly detection. We evaluate our proposed approach using a real IoT dataset, and demonstrate that it reduces detection delay by 84% while maintaining almost the same accuracy as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-01-10T05:29:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.