Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and
Supervised Learning
- URL: http://arxiv.org/abs/2210.05182v1
- Date: Tue, 11 Oct 2022 06:35:45 GMT
- Title: Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and
Supervised Learning
- Authors: Tinghao Zhang, Zhijun Li, Yongrui Chen, Kwok-Yan Lam, Jun Zhao
- Abstract summary: An edge-cloud cooperation framework is proposed to improve inference accuracy while maintaining low inference latency.
We deploy a lightweight model on the edge and a heavyweight model on the cloud.
Our method reduces up to 78.8% inference latency and achieves higher accuracy compared with the cloud-only strategy.
- Score: 23.4463067406809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) have been widely applied in Internet of Things
(IoT) systems for various tasks such as image classification and object
detection. However, heavyweight DNN models can hardly be deployed on edge
devices due to limited computational resources. In this paper, an edge-cloud
cooperation framework is proposed to improve inference accuracy while
maintaining low inference latency. To this end, we deploy a lightweight model
on the edge and a heavyweight model on the cloud. A reinforcement learning
(RL)-based DNN compression approach is used to generate the lightweight model
suitable for the edge from the heavyweight model. Moreover, a supervised
learning (SL)-based offloading strategy is applied to determine whether the
sample should be processed on the edge or on the cloud. Our method is
implemented on real hardware and tested on multiple datasets. The experimental
results show that (1) The sizes of the lightweight models obtained by RL-based
DNN compression are up to 87.6% smaller than those obtained by the baseline
method; (2) SL-based offloading strategy makes correct offloading decisions in
most cases; (3) Our method reduces up to 78.8% inference latency and achieves
higher accuracy compared with the cloud-only strategy.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary.
Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain.
Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z) - Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Improving Robustness Against Adversarial Attacks with Deeply Quantized
Neural Networks [0.5849513679510833]
A disadvantage of Deep Neural Networks (DNNs) is their vulnerability to adversarial attacks, as they can be fooled by adding slight perturbations to the inputs.
This paper reports the results of devising a tiny DNN model, robust to adversarial black and white box attacks, trained with an automatic quantizationaware training framework.
arXiv Detail & Related papers (2023-04-25T13:56:35Z) - HAVANA: Hard negAtiVe sAmples aware self-supervised coNtrastive leArning
for Airborne laser scanning point clouds semantic segmentation [9.310873951428238]
This work proposes a hard-negative sample aware self-supervised contrastive learning method to pre-train the model for semantic segmentation.
The results obtained by the proposed HAVANA method still exceed 94% of the supervised paradigm performance with full training set.
arXiv Detail & Related papers (2022-10-19T15:05:17Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative
Architecture for DNN Inference [16.847204351692632]
AppealNet is a novel edge/cloud collaborative architecture that runs deep learning (DL) tasks more efficiently than state-of-the-art solutions.
For a given input, AppealNet accurately predicts on-the-fly whether it can be successfully processed by the DL model deployed on the resource-constrained edge device.
arXiv Detail & Related papers (2021-05-10T04:13:35Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure.
Our method results in a state of the art network compression while being capable of achieving better inference accuracy.
In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.