Offloading Algorithms for Maximizing Inference Accuracy on Edge Device
Under a Time Constraint
- URL: http://arxiv.org/abs/2112.11413v1
- Date: Tue, 21 Dec 2021 18:21:24 GMT
- Title: Offloading Algorithms for Maximizing Inference Accuracy on Edge Device
Under a Time Constraint
- Authors: Andrea Fresa and Jaya Prakash Champati
- Abstract summary: We propose an approximation algorithm AMR2, and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant from optimal total accuracy.
As proof of concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNet, and is connected to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR2 for image classification application.
- Score: 15.038891477389535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the emergence of edge computing, the problem of offloading jobs between
an Edge Device (ED) and an Edge Server (ES) received significant attention in
the past. Motivated by the fact that an increasing number of applications are
using Machine Learning (ML) inference, we study the problem of offloading
inference jobs by considering the following novel aspects: 1) in contrast to a
typical computational job, the processing time of an inference job depends on
the size of the ML model, and 2) recently proposed Deep Neural Networks (DNNs)
for resource-constrained devices provide the choice of scaling the model size.
We formulate an assignment problem with the aim of maximizing the total
inference accuracy of n data samples available at the ED, subject to a time
constraint T on the makespan. We propose an approximation algorithm AMR2, and
prove that it results in a makespan at most 2T, and achieves a total accuracy
that is lower by a small constant from optimal total accuracy. As proof of
concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNet, and is
connected to a server equipped with ResNet, and studied the total accuracy and
makespan performance of AMR2 for image classification application.
Related papers
- Active Inference on the Edge: A Design Study [5.815300670677979]
Active Inference (ACI) is a concept from neuroscience that describes how the brain constantly predicts and evaluates sensory information to decrease long-term surprise.
We show how our ACI agent was able to quickly and traceably solve an optimization problem while fulfilling requirements.
arXiv Detail & Related papers (2023-11-17T16:03:04Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions [26.125340303868335]
Current approaches suffer from large inference times.
We propose a novel training algorithm that gives accuracy competitive with inferences models.
Our evaluation shows between $3$ and $110times$ speedups in inference time on large models with up to $23$ million parameters.
arXiv Detail & Related papers (2023-06-14T14:38:25Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Latency-Memory Optimized Splitting of Convolution Neural Networks for
Resource Constrained Edge Devices [1.6873748786804317]
We argue that running CNNs between an edge device and the cloud is synonymous to solving a resource-constrained optimization problem.
Experiments done on real-world edge devices show that, LMOS ensures feasible execution of different CNN models at the edge.
arXiv Detail & Related papers (2021-07-19T19:39:56Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Knowledge Distillation for Mobile Edge Computation Offloading [14.417463848473494]
We propose an edge computation offloading framework based on Deep Imitation Learning (DIL) and Knowledge Distillation (KD)
Our model has the shortest inference delay among all policies.
arXiv Detail & Related papers (2020-04-09T04:58:46Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.