Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource
Constrained IoT Systems
- URL: http://arxiv.org/abs/2306.12691v1
- Date: Thu, 22 Jun 2023 06:33:12 GMT
- Title: Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource
Constrained IoT Systems
- Authors: Juliano S. Assine, J. C. S. Santos Filho, Eduardo Valle, Marco
Levorato
- Abstract summary: We propose a novel split computing approach based on slimmable ensemble encoders.
The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time.
Our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices.
- Score: 12.427821850039448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The execution of large deep neural networks (DNN) at mobile edge devices
requires considerable consumption of critical resources, such as energy, while
imposing demands on hardware capabilities. In approaches based on edge
computing the execution of the models is offloaded to a compute-capable device
positioned at the edge of 5G infrastructures. The main issue of the latter
class of approaches is the need to transport information-rich signals over
wireless links with limited and time-varying capacity. The recent split
computing paradigm attempts to resolve this impasse by distributing the
execution of DNN models across the layers of the systems to reduce the amount
of data to be transmitted while imposing minimal computing load on mobile
devices. In this context, we propose a novel split computing approach based on
slimmable ensemble encoders. The key advantage of our design is the ability to
adapt computational load and transmitted data size in real-time with minimal
overhead and time. This is in contrast with existing approaches, where the same
adaptation requires costly context switching and model loading. Moreover, our
model outperforms existing solutions in terms of compression efficacy and
execution time, especially in the context of weak mobile devices. We present a
comprehensive comparison with the most advanced split computing solutions, as
well as an experimental evaluation on GPU-less devices.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge [35.40849522296486]
Large-scale foundation models (FoMos) can perform human-like intelligence.
FoMos need to be adapted to specialized downstream tasks through fine-tuning techniques.
We advocate multi-device cooperation within the device-edge cooperative fine-tuning paradigm.
arXiv Detail & Related papers (2024-07-13T12:47:14Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device [17.43467167013752]
We present DynO, a distributed inference framework that combines the best of both worlds to address several challenges.
We show that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution.
arXiv Detail & Related papers (2021-04-20T13:20:15Z) - Cost-effective Machine Learning Inference Offload for Edge Computing [0.3149883354098941]
This paper proposes a novel offloading mechanism by leveraging installed-base on-premises (edge) computational resources.
The proposed mechanism allows the edge devices to offload heavy and compute-intensive workloads to edge nodes instead of using remote cloud.
arXiv Detail & Related papers (2020-12-07T21:11:02Z) - Edge Intelligence for Energy-efficient Computation Offloading and
Resource Allocation in 5G Beyond [7.953533529450216]
5G beyond is an end-edge-cloud orchestrated network that can exploit heterogeneous capabilities of the end devices, edge servers, and the cloud.
In multi user wireless networks, diverse application requirements and the possibility of various radio access modes for communication among devices make it challenging to design an optimal computation offloading scheme.
Deep Reinforcement Learning (DRL) is an emerging technique to address such an issue with limited and less accurate network information.
arXiv Detail & Related papers (2020-11-17T05:51:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.