Related papers: SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management

SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management

URL: http://arxiv.org/abs/2408.17171v1
Date: Fri, 30 Aug 2024 10:17:37 GMT
Title: SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management
Authors: Jyoti Shokhanda, Utkarsh Pal, Aman Kumar, Soumi Chattopadhyay, Arani Bhattacharya,
Abstract summary: Emerging applications, such as augmented reality, require low-latency computing services with high reliability on user devices. We introduce SafeTail, a framework that meets both median and tail response time targets, with tail latency defined as latency beyond the 90th percentile threshold.
Score: 2.707215971599082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimizing tail latency while efficiently managing computational resources is crucial for delivering high-performance, latency-sensitive services in edge computing. Emerging applications, such as augmented reality, require low-latency computing services with high reliability on user devices, which often have limited computational capabilities. Consequently, these devices depend on nearby edge servers for processing. However, inherent uncertainties in network and computation latencies stemming from variability in wireless networks and fluctuating server loads make service delivery on time challenging. Existing approaches often focus on optimizing median latency but fall short of addressing the specific challenges of tail latency in edge environments, particularly under uncertain network and computational conditions. Although some methods do address tail latency, they typically rely on fixed or excessive redundancy and lack adaptability to dynamic network conditions, often being designed for cloud environments rather than the unique demands of edge computing. In this paper, we introduce SafeTail, a framework that meets both median and tail response time targets, with tail latency defined as latency beyond the 90^th percentile threshold. SafeTail addresses this challenge by selectively replicating services across multiple edge servers to meet target latencies. SafeTail employs a reward-based deep learning framework to learn optimal placement strategies, balancing the need to achieve target latencies with minimizing additional resource usage. Through trace-driven simulations, SafeTail demonstrated near-optimal performance and outperformed most baseline strategies across three diverse services.

Related papers

Service Placement in Small Cell Networks Using Distributed Best Arm Identification in Linear Bandits [11.92409456846963]
Small base stations (SBSs) serve as edge servers to enable low-latency service delivery.<n>limited edge capacity makes it challenging to decide which services to deploy locally versus in the cloud.<n>We propose a distributed and adaptive multi-agent best-arm identification (BAI) algorithm under a fixed-confidence setting.
arXiv Detail & Related papers (2025-06-22T12:45:01Z)
Interference-Aware Edge Runtime Prediction with Conformal Matrix Completion [10.776912158818437]
Accurately estimating workload runtime is a longstanding goal in computer systems. We develop a matrix factorization-inspired method that generates accurate interference-aware predictions with tight provably-guaranteed uncertainty bounds. We validate our method on a novel WebAssembly runtime dataset collected from 24 unique devices, achieving a prediction error of 5.2% -- 2x better than a naive application of existing methods.
arXiv Detail & Related papers (2025-03-09T03:41:32Z)
Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks. It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z)
Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z)
Dynamic Scheduling for Federated Edge Learning with Streaming Data [56.91063444859008]
We consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints. Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration.
arXiv Detail & Related papers (2023-05-02T07:41:16Z)
Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning [11.007816552466952]
This paper focuses on the problem of scheduling inference queries on Deep Neural Networks in edge networks at short timescales. By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP. We design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions.
arXiv Detail & Related papers (2023-01-31T13:23:34Z)
An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling. Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows. Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
arXiv Detail & Related papers (2022-07-17T16:52:51Z)
CAROL: Confidence-Aware Resilience Model for Edge Federations [13.864161788250856]
We present a confidence aware resilience model, CAROL, that utilizes a memory-efficient generative neural network to predict the Quality of Service (QoS) for a future state and a confidence score for each prediction. CAROL outperforms state-of-the-art resilience schemes by reducing the energy consumption, deadline violation rates and resilience overheads by up to 16, 17 and 36 percent, respectively.
arXiv Detail & Related papers (2022-03-14T14:37:31Z)
Variational Autoencoders for Reliability Optimization in Multi-Access Edge Computing Networks [36.54164679645639]
Multi-latency edge computing (MEC) is viewed as an integral part of future wireless networks to support new applications with stringent service reliability and latency requirements. guaranteeing ultra-reliable and low-latency MEC is very challenging due to uncertainties of wireless links, limited communications and computing resources, as well as dynamic network traffic. Enabling URLL MEC mandates taking into account the statistics of the end-to-end (E2E) latency and reliability across the wireless and edge computing systems.
arXiv Detail & Related papers (2022-01-25T01:20:37Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
Dynamic Compression Ratio Selection for Edge Inference Systems with Hard Deadlines [9.585931043664363]
We propose a dynamic compression ratio selection scheme for edge inference system with hard deadlines. Information augmentation that retransmits less compressed data of task with erroneous inference is proposed to enhance the accuracy performance. Considering the wireless transmission errors, we further design a retransmission scheme to reduce performance degradation due to packet losses.
arXiv Detail & Related papers (2020-05-25T17:11:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.