Related papers: Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks

Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks

URL: http://arxiv.org/abs/2010.16335v2
Date: Thu, 28 Jan 2021 14:05:38 GMT
Title: Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks
Authors: Roberto G. Pacheco, Rodrigo S. Couto and Osvaldo Simeone
Abstract summary: Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy. In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.
Score: 30.800324092046793
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy. In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.

Related papers

Uncertainty Quantification over Graph with Conformalized Graph Neural Networks [52.20904874696597]
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates.
arXiv Detail & Related papers (2023-05-23T21:38:23Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Unsupervised Early Exit in DNNs with Multiple Exits [0.0]
We focus on Elastic BERT, a pre-trained multi-exit DNN to demonstrate that it nearly' satisfies the Strong Dominance (SD) property. We empirically validate our algorithm on IMDb and Yelp datasets.
arXiv Detail & Related papers (2022-09-20T05:35:54Z)
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices [19.335535517714703]
The prediction accuracy of the deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data. Recent prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization parameters. This paper, for the first time, performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices.
arXiv Detail & Related papers (2022-03-21T19:10:40Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Early-exit deep neural networks for distorted images: providing an efficient edge offloading [69.43216268165402]
Edge offloading for deep neural networks (DNNs) can be adaptive to the input's complexity. We introduce expert side branches trained on a particular distortion type to improve against image distortion. This approach increases the estimated accuracy on the edge, improving the offloading decisions.
arXiv Detail & Related papers (2021-08-20T19:52:55Z)
Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search [25.841113960607334]
Emerging device-based Computing-in-memory (CiM) has been proved to be a promising candidate for high-energy efficiency deep neural network (DNN) computations. Most emerging devices suffer uncertainty issues, resulting in a difference between actual data stored and the weight value it is designed to be. This leads to an accuracy drop from trained models to actually deployed platforms.
arXiv Detail & Related papers (2021-07-06T23:29:36Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge [3.398008512297358]
CacheNet is a model caching framework for machine perception applications. It caches low-complexity models on end devices and high-complexity (or full) models on edge or cloud servers. It is 58-217% faster than baseline approaches that run inference tasks on end devices or edge servers alone.
arXiv Detail & Related papers (2020-07-03T16:32:14Z)
Increasing Trustworthiness of Deep Neural Networks via Accuracy Monitoring [20.456742449675904]
Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets. This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications. We propose a neural network-based accuracy monitor model, which only takes the deployed DNN's softmax probability output as its input.
arXiv Detail & Related papers (2020-07-03T03:09:36Z)
Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness. We show that focal loss allows us to learn models that are already very well calibrated. We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.