Calibration-Aided Edge Inference Offloading via Adaptive Model
Partitioning of Deep Neural Networks
- URL: http://arxiv.org/abs/2010.16335v2
- Date: Thu, 28 Jan 2021 14:05:38 GMT
- Title: Calibration-Aided Edge Inference Offloading via Adaptive Model
Partitioning of Deep Neural Networks
- Authors: Roberto G. Pacheco, Rodrigo S. Couto and Osvaldo Simeone
- Abstract summary: Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations.
This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy.
In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.
- Score: 30.800324092046793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile devices can offload deep neural network (DNN)-based inference to the
cloud, overcoming local hardware and energy limitations. However, offloading
adds communication delay, thus increasing the overall inference time, and hence
it should be used only when needed. An approach to address this problem
consists of the use of adaptive model partitioning based on early-exit DNNs.
Accordingly, the inference starts at the mobile device, and an intermediate
layer estimates the accuracy: If the estimated accuracy is sufficient, the
device takes the inference decision; Otherwise, the remaining layers of the DNN
run at the cloud. Thus, the device offloads the inference to the cloud only if
it cannot classify a sample with high confidence. This offloading requires a
correct accuracy prediction at the device. Nevertheless, DNNs are typically
miscalibrated, providing overconfident decisions. This work shows that the
employment of a miscalibrated early-exit DNN for offloading via model
partitioning can significantly decrease inference accuracy. In contrast, we
argue that implementing a calibration algorithm prior to deployment can solve
this problem, allowing for more reliable offloading decisions.
Related papers
- Uncertainty Quantification over Graph with Conformalized Graph Neural
Networks [52.20904874696597]
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data.
GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant.
We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates.
arXiv Detail & Related papers (2023-05-23T21:38:23Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Unsupervised Early Exit in DNNs with Multiple Exits [0.0]
We focus on Elastic BERT, a pre-trained multi-exit DNN to demonstrate that it nearly' satisfies the Strong Dominance (SD) property.
We empirically validate our algorithm on IMDb and Yelp datasets.
arXiv Detail & Related papers (2022-09-20T05:35:54Z) - Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on
Edge Devices [19.335535517714703]
The prediction accuracy of the deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data.
Recent prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization parameters.
This paper, for the first time, performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices.
arXiv Detail & Related papers (2022-03-21T19:10:40Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Early-exit deep neural networks for distorted images: providing an
efficient edge offloading [69.43216268165402]
Edge offloading for deep neural networks (DNNs) can be adaptive to the input's complexity.
We introduce expert side branches trained on a particular distortion type to improve against image distortion.
This approach increases the estimated accuracy on the edge, improving the offloading decisions.
arXiv Detail & Related papers (2021-08-20T19:52:55Z) - Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural
Accelerators with Application to Neural Architecture Search [25.841113960607334]
Emerging device-based Computing-in-memory (CiM) has been proved to be a promising candidate for high-energy efficiency deep neural network (DNN) computations.
Most emerging devices suffer uncertainty issues, resulting in a difference between actual data stored and the weight value it is designed to be.
This leads to an accuracy drop from trained models to actually deployed platforms.
arXiv Detail & Related papers (2021-07-06T23:29:36Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - CacheNet: A Model Caching Framework for Deep Learning Inference on the
Edge [3.398008512297358]
CacheNet is a model caching framework for machine perception applications.
It caches low-complexity models on end devices and high-complexity (or full) models on edge or cloud servers.
It is 58-217% faster than baseline approaches that run inference tasks on end devices or edge servers alone.
arXiv Detail & Related papers (2020-07-03T16:32:14Z) - Increasing Trustworthiness of Deep Neural Networks via Accuracy
Monitoring [20.456742449675904]
Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets.
This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications.
We propose a neural network-based accuracy monitor model, which only takes the deployed DNN's softmax probability output as its input.
arXiv Detail & Related papers (2020-07-03T03:09:36Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.