Related papers: Resource-Constrained Edge AI with Early Exit Prediction

Resource-Constrained Edge AI with Early Exit Prediction

URL: http://arxiv.org/abs/2206.07269v1
Date: Wed, 15 Jun 2022 03:14:21 GMT
Title: Resource-Constrained Edge AI with Early Exit Prediction
Authors: Rongkang Dong, Yuyi Mao and Jun Zhang
Abstract summary: We propose an early exit prediction mechanism to reduce the on-device computation overhead in a device-edge co-inference system. Specifically, we design a low-complexity module, namely the Exit Predictor, to guide some distinctly "hard" samples to bypass the computation of the early exits. Considering the varying communication bandwidth, we extend the early exit prediction mechanism for latency-aware edge inference.
Score: 5.060405696893342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: By leveraging the data sample diversity, the early-exit network recently emerges as a prominent neural network architecture to accelerate the deep learning inference process. However, intermediate classifiers of the early exits introduce additional computation overhead, which is unfavorable for resource-constrained edge artificial intelligence (AI). In this paper, we propose an early exit prediction mechanism to reduce the on-device computation overhead in a device-edge co-inference system supported by early-exit networks. Specifically, we design a low-complexity module, namely the Exit Predictor, to guide some distinctly "hard" samples to bypass the computation of the early exits. Besides, considering the varying communication bandwidth, we extend the early exit prediction mechanism for latency-aware edge inference, which adapts the prediction thresholds of the Exit Predictor and the confidence thresholds of the early-exit network via a few simple regression models. Extensive experiment results demonstrate the effectiveness of the Exit Predictor in achieving a better tradeoff between accuracy and on-device computation overhead for early-exit networks. Besides, compared with the baseline methods, the proposed method for latency-aware edge inference attains higher inference accuracy under different bandwidth conditions.

Related papers

Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework. To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features. Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z)
Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z)
Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications. Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure. We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z)
Deep Subspace Encoders for Nonlinear System Identification [0.0]
We propose a method which uses a truncated prediction loss and a subspace encoder for state estimation. We show that, under mild conditions, the proposed method is locally consistent, increases optimization stability, and achieves increased data efficiency.
arXiv Detail & Related papers (2022-10-26T16:04:38Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
Deep-Ensemble-Based Uncertainty Quantification in Spatiotemporal Graph Neural Networks for Traffic Forecasting [2.088376060651494]
We focus on a diffusion convolutional recurrent neural network (DCRNN), a state-of-the-art method for short-term traffic forecasting. We develop a scalable deep ensemble approach to quantify uncertainties for DCRNN. We show that our generic and scalable approach outperforms the current state-of-the-art Bayesian and a number of other commonly used frequentist techniques.
arXiv Detail & Related papers (2022-04-04T16:10:55Z)
Consistency Training of Multi-exit Architectures for Sensor Data [0.07614628596146598]
We present a novel and architecture-agnostic approach for robust training of multi-exit architectures termed consistent exit training. We leverage weak supervision to align model output with consistency training and jointly optimize dual-losses in a multi-task learning fashion over the exits in a network.
arXiv Detail & Related papers (2021-09-27T17:11:25Z)
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC) We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z)
Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions [80.78077900288868]
We decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.
arXiv Detail & Related papers (2021-06-09T12:33:02Z)
HAPI: Hardware-Aware Progressive Inference [18.214367595727037]
Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. This work presents HAPI, a novel methodology for generating high-performance early-exit networks.
arXiv Detail & Related papers (2020-08-10T09:55:18Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
Predictive Business Process Monitoring via Generative Adversarial Nets: The Case of Next Event Prediction [0.026249027950824504]
This paper proposes a novel adversarial training framework to address the problem of next event prediction. It works by putting one neural network against the other in a two-player game which leads to predictions that are indistinguishable from the ground truth. It systematically outperforms all baselines both in terms of accuracy and earliness of the prediction, despite using a simple network architecture and a naive feature encoding.
arXiv Detail & Related papers (2020-03-25T08:31:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.