Related papers: DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices

DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices

URL: http://arxiv.org/abs/2502.15735v1
Date: Thu, 06 Feb 2025 09:16:54 GMT
Title: DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices
Authors: Xian Peng, Xin Wu, Lianming Xu, Li Wang, Aiguo Fei,
Abstract summary: We propose DistrEE, a distributed DNN inference framework that can exit model inference early to meet quality of service requirements.<n>We show that DistrEE can efficiently realize efficient collaborative inference, achieving an effective trade-off between inference latency and accuracy.
Score: 13.916010072536377
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and resource-hungry inference tasks previously only possible on powerful servers, enabling new applications in areas such as autonomous vehicles, industrial automation, and smart homes. However, it is challenging to achieve accurate and efficient distributed edge inference due to the fluctuating nature of the actual resources of the devices and the processing difficulty of the input data. In this work, we propose DistrEE, a distributed DNN inference framework that can exit model inference early to meet specific quality of service requirements. In particular, the framework firstly integrates model early exit and distributed inference for multi-node collaborative inferencing scenarios. Furthermore, it designs an early exit policy to control when the model inference terminates. Extensive simulation results demonstrate that DistrEE can efficiently realize efficient collaborative inference, achieving an effective trade-off between inference latency and accuracy.

Related papers

Revisiting Outage for Edge Inference Systems [26.22867156180142]
We propose a theoretical framework that characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold. Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches.
arXiv Detail & Related papers (2025-03-22T13:10:27Z)
Adaptive Early Exiting for Collaborative Inference over Noisy Wireless Channels [17.890390892890057]
Collaborative inference systems are one of the emerging solutions for deploying deep neural networks (DNNs) at the wireless network edge. In this work, we study early exiting in the context of collaborative inference, which allows obtaining inference results at the edge device for certain samples. The central part of our system is the transmission-decision (TD) mechanism, which decides whether to keep the early exit prediction or transmit the data to the edge server for further processing.
arXiv Detail & Related papers (2023-11-29T21:31:59Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z)
Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications. Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure. We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z)
Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices [2.28438857884398]
We propose a novel method of exploiting model parallelism to separate a neural network for distributed inferences. Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated.
arXiv Detail & Related papers (2021-11-03T19:30:28Z)
Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks. In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework. In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z)
Resolution Adaptive Networks for Efficient Inference [53.04907454606711]
We propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs. In RANet, the input images are first routed to a lightweight sub-network that efficiently extracts low-resolution representations. High-resolution paths in the network maintain the capability to recognize the "hard" samples.
arXiv Detail & Related papers (2020-03-16T16:54:36Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.