SplitEE: Early Exit in Deep Neural Networks with Split Computing
- URL: http://arxiv.org/abs/2309.09195v1
- Date: Sun, 17 Sep 2023 07:48:22 GMT
- Title: SplitEE: Early Exit in Deep Neural Networks with Split Computing
- Authors: Divya J. Bajpai, Vivek K. Trivedi, Sohan L. Yadav, and Manjesh K.
Hanawal
- Abstract summary: Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks.
deploying full-fledged DNNs in resource-constrained devices is difficult due to their large size.
We propose combining two approaches by using early exits in split computing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) have drawn attention because of their outstanding
performance on various tasks. However, deploying full-fledged DNNs in
resource-constrained devices (edge, mobile, IoT) is difficult due to their
large size. To overcome the issue, various approaches are considered, like
offloading part of the computation to the cloud for final inference (split
computing) or performing the inference at an intermediary layer without passing
through all layers (early exits). In this work, we propose combining both
approaches by using early exits in split computing. In our approach, we decide
up to what depth of DNNs computation to perform on the device (splitting layer)
and whether a sample can exit from this layer or need to be offloaded. The
decisions are based on a weighted combination of accuracy, computational, and
communication costs. We develop an algorithm named SplitEE to learn an optimal
policy. Since pre-trained DNNs are often deployed in new domains where the
ground truths may be unavailable and samples arrive in a streaming fashion,
SplitEE works in an online and unsupervised setup. We extensively perform
experiments on five different datasets. SplitEE achieves a significant cost
reduction ($>50\%$) with a slight drop in accuracy ($<2\%$) as compared to the
case when all samples are inferred at the final layer. The anonymized source
code is available at
\url{https://anonymous.4open.science/r/SplitEE_M-B989/README.md}.
Related papers
- Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach [5.402030962296633]
Deep Neural Networks (DNNs) have demonstrated outstanding performance across various domains.
A distributed inference setup can be used where a small-sized DNN can be deployed on mobile, a bigger version on the edge, and the full-fledged, on the cloud.
We develop a novel approach that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs.
arXiv Detail & Related papers (2024-10-06T20:14:27Z) - Early-exit Convolutional Neural Networks [10.320641540183198]
'Early-exit CNNs', EENets, adapt their computational cost based on the input by stopping the inference process at certain exit locations.
EENets achieve similar accuracy with their non-EE versions while reducing the computational cost to 20% of the original.
arXiv Detail & Related papers (2024-09-09T05:29:38Z) - MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via
Automating Deep Neural Network Porting for Mobile Deployment [54.77943671991863]
MatchNAS is a novel scheme for porting Deep Neural Networks to mobile devices.
We optimise a large network family using both labelled and unlabelled data.
We then automatically search for tailored networks for different hardware platforms.
arXiv Detail & Related papers (2024-02-21T04:43:12Z) - I-SplitEE: Image classification in Split Computing DNNs with Early Exits [5.402030962296633]
Large size of Deep Neural Networks (DNNs) hinders deploying them on resource-constrained devices like edge, mobile, and IoT platforms.
Our work presents an innovative unified approach merging early exits and split computing.
I-SplitEE is an online unsupervised algorithm ideal for scenarios lacking ground truths and with sequential data.
arXiv Detail & Related papers (2024-01-19T07:44:32Z) - OFA$^2$: A Multi-Objective Perspective for the Once-for-All Neural
Architecture Search [79.36688444492405]
Once-for-All (OFA) is a Neural Architecture Search (NAS) framework designed to address the problem of searching efficient architectures for devices with different resources constraints.
We aim to give one step further in the search for efficiency by explicitly conceiving the search stage as a multi-objective optimization problem.
arXiv Detail & Related papers (2023-03-23T21:30:29Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - DEFER: Distributed Edge Inference for Deep Neural Networks [5.672898304129217]
We present DEFER, a framework for distributed edge inference.
It partitions deep neural networks into layers that can be spread across multiple compute nodes.
We find that for the ResNet50 model, the inference throughput of DEFER with 8 compute nodes is 53% higher and per node energy consumption is 63% lower than single device inference.
arXiv Detail & Related papers (2022-01-18T06:50:45Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.