Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources
- URL: http://arxiv.org/abs/2409.03103v1
- Date: Wed, 4 Sep 2024 22:03:07 GMT
- Title: Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources
- Authors: Amadou Ba, Pavithra Harsha, Chitra Subramanian,
- Abstract summary: We develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization.
We then use the developed model to predict the end-to-end latency.
We demonstrate the merit of a microservice-based application and provide a roadmap to deployment.
- Score: 1.1470070927586018
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern web services adopt cloud-native principles to leverage the advantages of microservices. To consistently guarantee high Quality of Service (QoS) according to Service Level Agreements (SLAs), ensure satisfactory user experiences, and minimize operational costs, each microservice must be provisioned with the right amount of resources. However, accurately provisioning microservices with adequate resources is complex and depends on many factors, including workload intensity and the complex interconnections between microservices. To address this challenge, we develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization. We then use the developed model to predict the end-to-end latency. Our solution leverages the Temporal Fusion Transformer (TFT), an attention-based architecture equipped with interpretability features. When the prediction results indicate SLA non-compliance, we use the feature importance provided by the TFT as covariates in Kernel Ridge Regression (KRR), with the response variable being the desired latency, to learn the parameters associated with the feature importance. These learned parameters reflect the adjustments required to the features to ensure SLA compliance. We demonstrate the merit of our approach with a microservice-based application and provide a roadmap to deployment.
Related papers
- Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.
One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.
We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z) - STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges.
We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments.
Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z) - Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning [43.96374556275842]
It is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements.
This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations.
arXiv Detail & Related papers (2025-01-08T16:55:04Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - TPMCF: Temporal QoS Prediction using Multi-Source Collaborative Features [0.5161531917413706]
Temporal Prediction is essential to identify a suitable service over time.
Recent methods hardly achieved desired accuracy due to various limitations.
This paper proposes a scalable strategy for Temporal Prediction using Multi-source Collaborative-Features.
arXiv Detail & Related papers (2023-03-30T06:49:53Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - A Graph Neural Networks based Framework for Topology-Aware Proactive SLA
Management in a Latency Critical NFV Application Use-case [0.34376560669160383]
Recent advancements in 5G and 6G have led to the emergence of latency-critical applications delivered via a Network-series (NFV) enabled paradigm.
We propose a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability.
arXiv Detail & Related papers (2022-11-10T23:22:05Z) - Federated Learning with Correlated Data: Taming the Tail for Age-Optimal
Industrial IoT [55.62157530259969]
We study a sensor's transmit power minimization subject to the peak-AoI requirement and a probabilistic constraint on queuing latency.
We propose a local-model selection approach which accounts for correlation among the sensor's training data.
Numerical results show the tradeoff between the transmit power, peak AoI, and delay's tail distribution.
arXiv Detail & Related papers (2021-08-17T08:38:31Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.