DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
Inference
- URL: http://arxiv.org/abs/2306.01811v3
- Date: Fri, 23 Jun 2023 07:34:40 GMT
- Title: DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
Inference
- Authors: Ziyang Zhang, Yang Zhao, Huan Li, Changyao Lin, and Jie Liu
- Abstract summary: We propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework.
It automatically co-optimizes the CPU, GPU and memory frequencies of edge devices, and the feature maps to be offloaded to cloud servers.
It significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes.
- Score: 12.095934624748686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to limited resources on edge and different characteristics of deep neural
network (DNN) models, it is a big challenge to optimize DNN inference
performance in terms of energy consumption and end-to-end latency on edge
devices. In addition to the dynamic voltage frequency scaling (DVFS) technique,
the edge-cloud architecture provides a collaborative approach for efficient DNN
inference. However, current edge-cloud collaborative inference methods have not
optimized various compute resources on edge devices. Thus, we propose DVFO, a
novel DVFS-enabled edge-cloud collaborative inference framework, which
co-optimizes DVFS and offloading parameters via deep reinforcement learning
(DRL). Specifically, DVFO automatically co-optimizes 1) the CPU, GPU and memory
frequencies of edge devices, and 2) the feature maps to be offloaded to cloud
servers. In addition, it leverages a thinking-while-moving concurrent mechanism
to accelerate the DRL learning process, and a spatial-channel attention
mechanism to extract DNN feature maps of secondary importance for workload
offloading. This approach improves inference performance for different DNN
models under various edge-cloud network conditions. Extensive evaluations using
two datasets and six widely-deployed DNN models on three heterogeneous edge
devices show that DVFO significantly reduces the energy consumption by 33% on
average, compared to state-of-the-art schemes. Moreover, DVFO achieves up to
28.6%-59.1% end-to-end latency reduction, while maintaining accuracy within 1%
loss on average.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - SpikeBottleNet: Spike-Driven Feature Compression Architecture for Edge-Cloud Co-Inference [0.86325068644655]
We propose SpikeBottleNet, a novel architecture for edge-cloud co-inference systems.
SpikeBottleNet integrates a spiking neuron model to significantly reduce energy consumption on edge devices.
arXiv Detail & Related papers (2024-10-11T09:59:21Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Design and Scaffolded Training of an Efficient DNN Operator for Computer
Vision on the Edge [3.3767251810292955]
FuSeConv is a drop-in replacement for depthwise separable convolutions.
FuSeConv factorizes convolution fully along their spatial and depth dimensions.
Neural Operator Scaffolding scaffolds the training of FuSeConv by distilling knowledge from depthwise separable convolutions.
arXiv Detail & Related papers (2021-08-25T19:22:25Z) - Latency-Memory Optimized Splitting of Convolution Neural Networks for
Resource Constrained Edge Devices [1.6873748786804317]
We argue that running CNNs between an edge device and the cloud is synonymous to solving a resource-constrained optimization problem.
Experiments done on real-world edge devices show that, LMOS ensures feasible execution of different CNN models at the edge.
arXiv Detail & Related papers (2021-07-19T19:39:56Z) - AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative
Architecture for DNN Inference [16.847204351692632]
AppealNet is a novel edge/cloud collaborative architecture that runs deep learning (DL) tasks more efficiently than state-of-the-art solutions.
For a given input, AppealNet accurately predicts on-the-fly whether it can be successfully processed by the DL model deployed on the resource-constrained edge device.
arXiv Detail & Related papers (2021-05-10T04:13:35Z) - Dynamic DNN Decomposition for Lossless Synergistic Inference [0.9549013615433989]
Deep neural networks (DNNs) sustain high performance in today's data processing applications.
We propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss.
D3 outperforms the state-of-the-art counterparts up to 3.4 times in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68 times.
arXiv Detail & Related papers (2021-01-15T03:18:53Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.