Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and
Supervised Learning
- URL: http://arxiv.org/abs/2210.05182v1
- Date: Tue, 11 Oct 2022 06:35:45 GMT
- Title: Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and
Supervised Learning
- Authors: Tinghao Zhang, Zhijun Li, Yongrui Chen, Kwok-Yan Lam, Jun Zhao
- Abstract summary: An edge-cloud cooperation framework is proposed to improve inference accuracy while maintaining low inference latency.
We deploy a lightweight model on the edge and a heavyweight model on the cloud.
Our method reduces up to 78.8% inference latency and achieves higher accuracy compared with the cloud-only strategy.
- Score: 23.4463067406809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) have been widely applied in Internet of Things
(IoT) systems for various tasks such as image classification and object
detection. However, heavyweight DNN models can hardly be deployed on edge
devices due to limited computational resources. In this paper, an edge-cloud
cooperation framework is proposed to improve inference accuracy while
maintaining low inference latency. To this end, we deploy a lightweight model
on the edge and a heavyweight model on the cloud. A reinforcement learning
(RL)-based DNN compression approach is used to generate the lightweight model
suitable for the edge from the heavyweight model. Moreover, a supervised
learning (SL)-based offloading strategy is applied to determine whether the
sample should be processed on the edge or on the cloud. Our method is
implemented on real hardware and tested on multiple datasets. The experimental
results show that (1) The sizes of the lightweight models obtained by RL-based
DNN compression are up to 87.6% smaller than those obtained by the baseline
method; (2) SL-based offloading strategy makes correct offloading decisions in
most cases; (3) Our method reduces up to 78.8% inference latency and achieves
higher accuracy compared with the cloud-only strategy.
Related papers
- Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Improving Robustness Against Adversarial Attacks with Deeply Quantized
Neural Networks [0.5849513679510833]
A disadvantage of Deep Neural Networks (DNNs) is their vulnerability to adversarial attacks, as they can be fooled by adding slight perturbations to the inputs.
This paper reports the results of devising a tiny DNN model, robust to adversarial black and white box attacks, trained with an automatic quantizationaware training framework.
arXiv Detail & Related papers (2023-04-25T13:56:35Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - HAVANA: Hard negAtiVe sAmples aware self-supervised coNtrastive leArning
for Airborne laser scanning point clouds semantic segmentation [9.310873951428238]
This work proposes a hard-negative sample aware self-supervised contrastive learning method to pre-train the model for semantic segmentation.
The results obtained by the proposed HAVANA method still exceed 94% of the supervised paradigm performance with full training set.
arXiv Detail & Related papers (2022-10-19T15:05:17Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative
Architecture for DNN Inference [16.847204351692632]
AppealNet is a novel edge/cloud collaborative architecture that runs deep learning (DL) tasks more efficiently than state-of-the-art solutions.
For a given input, AppealNet accurately predicts on-the-fly whether it can be successfully processed by the DL model deployed on the resource-constrained edge device.
arXiv Detail & Related papers (2021-05-10T04:13:35Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure.
Our method results in a state of the art network compression while being capable of achieving better inference accuracy.
In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z) - Edge-Detect: Edge-centric Network Intrusion Detection using Deep Neural
Network [0.0]
Edge nodes are crucial for detection against multitudes of cyber attacks on Internet-of-Things endpoints.
We develop a novel light, fast and accurate 'Edge-Detect' model, which detects Denial of Service attack on edge nodes using DLM techniques.
arXiv Detail & Related papers (2021-02-03T04:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.