Related papers: Understanding Container-based Services under Software Aging: Dependability and Performance Views

Understanding Container-based Services under Software Aging: Dependability and Performance Views

URL: http://arxiv.org/abs/2308.12784v1
Date: Thu, 24 Aug 2023 13:40:26 GMT
Title: Understanding Container-based Services under Software Aging: Dependability and Performance Views
Authors: Jing Bai, Xiaolin Chang, Fumio Machida, Kishor S. Trivedi
Abstract summary: We show the optimal con-tainer-migration trigger intervals that can maximize the de-pendability or minimize the performance of a container-based service. This paper proposes a comprehensive semi-Markov-based approach to quantitatively evaluate the effect of OS reju-venation on the dependability and the performance of a con-tainer-based service.
Score: 5.2135218089240185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Container technology, as the key enabler behind microservice architectures, is widely applied in Cloud and Edge Computing. A long and continuous running of operating system (OS) host-ing container-based services can encounter software aging that leads to performance deterioration and even causes system fail-ures. OS rejuvenation techniques can mitigate the impact of software aging but the rejuvenation trigger interval needs to be carefully determined to reduce the downtime cost due to rejuve-nation. This paper proposes a comprehensive semi-Markov-based approach to quantitatively evaluate the effect of OS reju-venation on the dependability and the performance of a con-tainer-based service. In contrast to the existing studies, we nei-ther restrict the distributions of time intervals of events to be exponential nor assume that backup resources are always avail-able. Through the numerical study, we show the optimal con-tainer-migration trigger intervals that can maximize the de-pendability or minimize the performance of a container-based service.

Related papers

The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework. To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features. Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z)
Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning [43.96374556275842]
It is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements. This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations.
arXiv Detail & Related papers (2025-01-08T16:55:04Z)
Split Learning in Computer Vision for Semantic Segmentation Delay Minimization [25.0679083637967]
We propose a novel approach to minimize the inference delay in semantic segmentation using split learning (SL) SL is tailored to the needs of real-time computer vision (CV) applications for resource-constrained devices.
arXiv Detail & Related papers (2024-12-18T19:07:25Z)
SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management [2.707215971599082]
Emerging applications, such as augmented reality, require low-latency computing services with high reliability on user devices. We introduce SafeTail, a framework that meets both median and tail response time targets, with tail latency defined as latency beyond the 90th percentile threshold.
arXiv Detail & Related papers (2024-08-30T10:17:37Z)
The Fusion of Deep Reinforcement Learning and Edge Computing for Real-time Monitoring and Control Optimization in IoT Environments [2.0380092516669235]
This paper proposes an optimization control system based on deep reinforcement learning and edge computing. Results demonstrate that this approach reduces cloud-edge communication latency, accelerates response to abnormal situations, reduces system failure rates, extends average equipment operating time, and saves costs for manual maintenance and replacement.
arXiv Detail & Related papers (2024-02-28T12:01:06Z)
Client Orchestration and Cost-Efficient Joint Optimization for NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation. We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z)
TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z)
DeepScaler: Holistic Autoscaling for Microservices Based on Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach. It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency. Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z)
Dynamic Scheduling for Federated Edge Learning with Streaming Data [56.91063444859008]
We consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints. Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration.
arXiv Detail & Related papers (2023-05-02T07:41:16Z)
Time-sensitive Learning for Heterogeneous Federated Edge Intelligence [52.83633954857744]
We investigate real-time machine learning in a federated edge intelligence (FEI) system. FEI systems exhibit heterogenous communication and computational resource distribution. We propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model.
arXiv Detail & Related papers (2023-01-26T08:13:22Z)
Innovations in the field of on-board scheduling technologies [64.41511459132334]
This paper proposes an onboard scheduler, that integrates inside an onboard software framework for mission autonomy. The scheduler is based on linear integer programming and relies on the use of a branch-and-cut solver. The technology has been tested on an Earth Observation scenario, comparing its performance against the state-of-the-art scheduling technology.
arXiv Detail & Related papers (2022-05-04T12:00:49Z)
CAROL: Confidence-Aware Resilience Model for Edge Federations [13.864161788250856]
We present a confidence aware resilience model, CAROL, that utilizes a memory-efficient generative neural network to predict the Quality of Service (QoS) for a future state and a confidence score for each prediction. CAROL outperforms state-of-the-art resilience schemes by reducing the energy consumption, deadline violation rates and resilience overheads by up to 16, 17 and 36 percent, respectively.
arXiv Detail & Related papers (2022-03-14T14:37:31Z)
Queue-Learning: A Reinforcement Learning Approach for Providing Quality of Service [1.8477401359673706]
Servicerate control is a common mechanism for providing guarantees in service systems. In this paper, we introduce a reinforcement learning-based (RL-based) service-rate controller. Our controller provides explicit probabilistic guarantees on the end-to-end delay of the system.
arXiv Detail & Related papers (2021-01-12T17:28:57Z)
Untangling tradeoffs between recurrence and self-attention in neural networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks. We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies. We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.