AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference
under Stochastic Variance
- URL: http://arxiv.org/abs/2005.02544v1
- Date: Wed, 6 May 2020 00:30:29 GMT
- Title: AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference
under Stochastic Variance
- Authors: Young Geun Kim and Carole-Jean Wu
- Abstract summary: AutoScale is an adaptive and light-weight execution scaling engine built upon the custom-designed reinforcement learning algorithm.
This paper proposes AutoScale to enable accurate, energy-efficient deep learning inference at the edge.
- Score: 11.093360539563657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning inference is increasingly run at the edge. As the programming
and system stack support becomes mature, it enables acceleration opportunities
within a mobile system, where the system performance envelope is scaled up with
a plethora of programmable co-processors. Thus, intelligent services designed
for mobile users can choose between running inference on the CPU or any of the
co-processors on the mobile system, or exploiting connected systems, such as
the cloud or a nearby, locally connected system. By doing so, the services can
scale out the performance and increase the energy efficiency of edge mobile
systems. This gives rise to a new challenge - deciding when inference should
run where. Such execution scaling decision becomes more complicated with the
stochastic nature of mobile-cloud execution, where signal strength variations
of the wireless networks and resource interference can significantly affect
real-time inference performance and system energy efficiency. To enable
accurate, energy-efficient deep learning inference at the edge, this paper
proposes AutoScale. AutoScale is an adaptive and light-weight execution scaling
engine built upon the custom-designed reinforcement learning algorithm. It
continuously learns and selects the most energy-efficient inference execution
target by taking into account characteristics of neural networks and available
systems in the collaborative cloud-edge execution environment while adapting to
the stochastic runtime variance. Real system implementation and evaluation,
considering realistic execution scenarios, demonstrate an average of 9.8 and
1.6 times energy efficiency improvement for DNN edge inference over the
baseline mobile CPU and cloud offloading, while meeting the real-time
performance and accuracy requirement.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC) [1.949471382288103]
Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement.
In this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined.
Various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices are discussed.
arXiv Detail & Related papers (2024-07-16T11:51:41Z) - Energy-Efficient Federated Edge Learning with Streaming Data: A Lyapunov Optimization Approach [34.00679567444125]
We develop a dynamic scheduling and resource allocation algorithm to address the inherent randomness in data arrivals and resource availability under long-term energy constraints.
Our proposed algorithm makes adaptive decisions on device scheduling, computational capacity adjustment, and allocation of bandwidth and transmit power in every round.
The effectiveness of our scheme is verified through simulation results, demonstrating improved learning performance and energy efficiency as compared to baseline schemes.
arXiv Detail & Related papers (2024-05-20T14:13:22Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Offloading and Quality Control for AI Generated Content Services in 6G Mobile Edge Computing Networks [18.723955271182007]
This paper proposes a joint optimization algorithm for offloading decisions, computation time, and diffusion steps of the diffusion models in the reverse diffusion stage.
Experimental results conclusively demonstrate that the proposed algorithm achieves superior joint optimization performance compared to the baselines.
arXiv Detail & Related papers (2023-12-11T08:36:27Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered
by Reconfigurable Intelligent Surfaces [30.1512069754603]
We propose a novel algorithm for energy-efficient, low-latency, accurate inference at the wireless edge.
We consider a scenario where new data are continuously generated/collected by a set of devices and are handled through a dynamic queueing system.
arXiv Detail & Related papers (2023-05-18T12:46:42Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge
Intelligence via Online Learning [19.013102763434794]
This paper builds a collaborative deep inference system between a resource-constrained mobile device and a powerful edge server.
Our system has a built-in online learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn the optimal partition point.
ANS significantly outperforms state-of-the-art benchmarks in terms of tracking system changes and reducing the end-to-end inference delay.
arXiv Detail & Related papers (2021-02-02T18:50:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.