Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources
- URL: http://arxiv.org/abs/2409.03103v1
- Date: Wed, 4 Sep 2024 22:03:07 GMT
- Title: Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources
- Authors: Amadou Ba, Pavithra Harsha, Chitra Subramanian,
- Abstract summary: We develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization.
We then use the developed model to predict the end-to-end latency.
We demonstrate the merit of a microservice-based application and provide a roadmap to deployment.
- Score: 1.1470070927586018
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern web services adopt cloud-native principles to leverage the advantages of microservices. To consistently guarantee high Quality of Service (QoS) according to Service Level Agreements (SLAs), ensure satisfactory user experiences, and minimize operational costs, each microservice must be provisioned with the right amount of resources. However, accurately provisioning microservices with adequate resources is complex and depends on many factors, including workload intensity and the complex interconnections between microservices. To address this challenge, we develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization. We then use the developed model to predict the end-to-end latency. Our solution leverages the Temporal Fusion Transformer (TFT), an attention-based architecture equipped with interpretability features. When the prediction results indicate SLA non-compliance, we use the feature importance provided by the TFT as covariates in Kernel Ridge Regression (KRR), with the response variable being the desired latency, to learn the parameters associated with the feature importance. These learned parameters reflect the adjustments required to the features to ensure SLA compliance. We demonstrate the merit of our approach with a microservice-based application and provide a roadmap to deployment.
Related papers
- Online Client Scheduling and Resource Allocation for Efficient Federated Edge Learning [9.451084740123198]
Federated learning (FL) enables edge devices to collaboratively train a machine learning model without sharing their raw data.
However, deploying FL over mobile edge networks with constrained resources such as power, bandwidth, and suffers from high training latency and low model accuracy.
This paper investigates the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty.
arXiv Detail & Related papers (2024-09-29T01:56:45Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets.
In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale.
We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z) - TPMCF: Temporal QoS Prediction using Multi-Source Collaborative Features [0.5161531917413706]
Temporal Prediction is essential to identify a suitable service over time.
Recent methods hardly achieved desired accuracy due to various limitations.
This paper proposes a scalable strategy for Temporal Prediction using Multi-source Collaborative-Features.
arXiv Detail & Related papers (2023-03-30T06:49:53Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - An ADMM-Incorporated Latent Factorization of Tensors Method for QoS
Prediction [2.744577504320494]
Quality of service (QoS) describes the performance of a web service dynamically with respect to the service requested by the service consumer.
Latent factorization of tenors (LFT) is very effective for discovering temporal patterns in high dimensional and sparse (HiDS) tensors.
Current LFT models suffer from a low convergence rate and rarely account for the effects of outliers.
arXiv Detail & Related papers (2022-12-03T12:35:48Z) - A Graph Neural Networks based Framework for Topology-Aware Proactive SLA
Management in a Latency Critical NFV Application Use-case [0.34376560669160383]
Recent advancements in 5G and 6G have led to the emergence of latency-critical applications delivered via a Network-series (NFV) enabled paradigm.
We propose a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability.
arXiv Detail & Related papers (2022-11-10T23:22:05Z) - Federated Learning with Correlated Data: Taming the Tail for Age-Optimal
Industrial IoT [55.62157530259969]
We study a sensor's transmit power minimization subject to the peak-AoI requirement and a probabilistic constraint on queuing latency.
We propose a local-model selection approach which accounts for correlation among the sensor's training data.
Numerical results show the tradeoff between the transmit power, peak AoI, and delay's tail distribution.
arXiv Detail & Related papers (2021-08-17T08:38:31Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.