Revisiting Outage for Edge Inference Systems
- URL: http://arxiv.org/abs/2504.03686v2
- Date: Mon, 28 Apr 2025 06:14:26 GMT
- Title: Revisiting Outage for Edge Inference Systems
- Authors: Zhanwei Wang, Qunsong Zeng, Haotian Zheng, Kaibin Huang,
- Abstract summary: We propose a theoretical framework that characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold.<n> Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches.
- Score: 26.22867156180142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key missions of sixth-generation (6G) mobile networks is to deploy large-scale artificial intelligence (AI) models at the network edge to provide remote-inference services for edge devices. The resultant platform, known as edge inference, will support a wide range of Internet-of-Things applications, such as autonomous driving, industrial automation, and augmented reality. Given the mission-critical and time-sensitive nature of these tasks, it is essential to design edge inference systems that are both reliable and capable of meeting stringent end-to-end (E2E) latency constraints. Existing studies, which primarily focus on communication reliability as characterized by channel outage probability, may fail to guarantee E2E performance, specifically in terms of E2E inference accuracy and latency. To address this limitation, we propose a theoretical framework that introduces and mathematically characterizes the inference outage (InfOut) probability, which quantifies the likelihood that the E2E inference accuracy falls below a target threshold. Under an E2E latency constraint, this framework establishes a fundamental tradeoff between communication overhead (i.e., uploading more sensor observations) and inference reliability as quantified by the InfOut probability. To find a tractable way to optimize this tradeoff, we derive accurate surrogate functions for InfOut probability by applying a Gaussian approximation to the distribution of the received discriminant gain. Experimental results demonstrate the superiority of the proposed design over conventional communication-centric approaches in terms of E2E inference reliability.
Related papers
- DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.
It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.
It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices [13.916010072536377]
We propose DistrEE, a distributed DNN inference framework that can exit model inference early to meet quality of service requirements.<n>We show that DistrEE can efficiently realize efficient collaborative inference, achieving an effective trade-off between inference latency and accuracy.
arXiv Detail & Related papers (2025-02-06T09:16:54Z) - Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks [55.467288506826755]
Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks.
Most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance.
We propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme.
arXiv Detail & Related papers (2025-01-20T04:26:21Z) - Learning Robust Representations for Communications over Interference-limited Channels [0.6990493129893111]
This study introduces two highly effective methodologies, namely TwinNet and SiameseNet, for the design of encoders and decoders for block transmission and detection in interference-limited environments.
arXiv Detail & Related papers (2024-10-13T09:09:21Z) - Reliable Federated Disentangling Network for Non-IID Domain Feature [62.73267904147804]
In this paper, we propose a novel reliable federated disentangling network, termed RFedDis.
To the best of our knowledge, our proposed RFedDis is the first work to develop an FL approach based on evidential uncertainty combined with feature disentangling.
Our proposed RFedDis provides outstanding performance with a high degree of reliability as compared to other state-of-the-art FL approaches.
arXiv Detail & Related papers (2023-01-30T11:46:34Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Variational Autoencoders for Reliability Optimization in Multi-Access
Edge Computing Networks [36.54164679645639]
Multi-latency edge computing (MEC) is viewed as an integral part of future wireless networks to support new applications with stringent service reliability and latency requirements.
guaranteeing ultra-reliable and low-latency MEC is very challenging due to uncertainties of wireless links, limited communications and computing resources, as well as dynamic network traffic.
Enabling URLL MEC mandates taking into account the statistics of the end-to-end (E2E) latency and reliability across the wireless and edge computing systems.
arXiv Detail & Related papers (2022-01-25T01:20:37Z) - Introspective Robot Perception using Smoothed Predictions from Bayesian
Neural Networks [17.162534445528827]
This work focuses on improving uncertainty estimation in the field of object classification from RGB images.
We employ a (BNN) and evaluate two practical inference techniques to obtain better uncertainty estimates.
We show a performance increase using more reliable uncertainty estimates as unary potentials within a Conditional Random Field.
arXiv Detail & Related papers (2021-09-27T08:40:19Z) - Quantized Federated Learning under Transmission Delay and Outage
Constraints [30.892724364965005]
Federated learning is a viable distributed learning paradigm which trains a machine learning model collaboratively with massive mobile devices in the wireless edge.
In practical systems with limited radio resources, transmission of a large number of model parameters inevitably suffers from quantization errors (QE) and transmission outage (TO)
We propose a robust FL scheme, named FedTOE, which performs joint allocation of wireless resources and quantization bits across the clients to minimize the QE while making the clients have the same TO probability.
arXiv Detail & Related papers (2021-06-17T11:29:12Z) - Uncertainty-Aware Deep Calibrated Salient Object Detection [74.58153220370527]
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy.
These methods overlook the gap between network accuracy and prediction confidence, known as the confidence uncalibration problem.
We introduce an uncertaintyaware deep SOD network, and propose two strategies to prevent deep SOD networks from being overconfident.
arXiv Detail & Related papers (2020-12-10T23:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.