Related papers: Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and Supervised Learning

Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and Supervised Learning

URL: http://arxiv.org/abs/2210.05182v1
Date: Tue, 11 Oct 2022 06:35:45 GMT
Title: Edge-Cloud Cooperation for DNN Inference via Reinforcement Learning and Supervised Learning
Authors: Tinghao Zhang, Zhijun Li, Yongrui Chen, Kwok-Yan Lam, Jun Zhao
Abstract summary: An edge-cloud cooperation framework is proposed to improve inference accuracy while maintaining low inference latency. We deploy a lightweight model on the edge and a heavyweight model on the cloud. Our method reduces up to 78.8% inference latency and achieves higher accuracy compared with the cloud-only strategy.
Score: 23.4463067406809
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Neural Networks (DNNs) have been widely applied in Internet of Things (IoT) systems for various tasks such as image classification and object detection. However, heavyweight DNN models can hardly be deployed on edge devices due to limited computational resources. In this paper, an edge-cloud cooperation framework is proposed to improve inference accuracy while maintaining low inference latency. To this end, we deploy a lightweight model on the edge and a heavyweight model on the cloud. A reinforcement learning (RL)-based DNN compression approach is used to generate the lightweight model suitable for the edge from the heavyweight model. Moreover, a supervised learning (SL)-based offloading strategy is applied to determine whether the sample should be processed on the edge or on the cloud. Our method is implemented on real hardware and tested on multiple datasets. The experimental results show that (1) The sizes of the lightweight models obtained by RL-based DNN compression are up to 87.6% smaller than those obtained by the baseline method; (2) SL-based offloading strategy makes correct offloading decisions in most cases; (3) Our method reduces up to 78.8% inference latency and achieves higher accuracy compared with the cloud-only strategy.

Related papers

Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks [1.6975640673527588]
A drawback of Deep Neural Networks (DNNs) is their susceptibility to adversarial attacks. This paper presents the outcomes of a compact DNN model that exhibits resilience against both black-box and white-box adversarial attacks. It has achieved this resilience through training with the QKeras quantization-aware training framework.
arXiv Detail & Related papers (2025-03-12T00:34:25Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Improving Robustness Against Adversarial Attacks with Deeply Quantized Neural Networks [0.5849513679510833]
A disadvantage of Deep Neural Networks (DNNs) is their vulnerability to adversarial attacks, as they can be fooled by adding slight perturbations to the inputs. This paper reports the results of devising a tiny DNN model, robust to adversarial black and white box attacks, trained with an automatic quantizationaware training framework.
arXiv Detail & Related papers (2023-04-25T13:56:35Z)
HAVANA: Hard negAtiVe sAmples aware self-supervised coNtrastive leArning for Airborne laser scanning point clouds semantic segmentation [9.310873951428238]
This work proposes a hard-negative sample aware self-supervised contrastive learning method to pre-train the model for semantic segmentation. The results obtained by the proposed HAVANA method still exceed 94% of the supervised paradigm performance with full training set.
arXiv Detail & Related papers (2022-10-19T15:05:17Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative Architecture for DNN Inference [16.847204351692632]
AppealNet is a novel edge/cloud collaborative architecture that runs deep learning (DL) tasks more efficiently than state-of-the-art solutions. For a given input, AppealNet accurately predicts on-the-fly whether it can be successfully processed by the DL model deployed on the resource-constrained edge device.
arXiv Detail & Related papers (2021-05-10T04:13:35Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure. Our method results in a state of the art network compression while being capable of achieving better inference accuracy. In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.