PBScaler: A Bottleneck-aware Autoscaling Framework for
Microservice-based Applications
- URL: http://arxiv.org/abs/2303.14620v3
- Date: Mon, 25 Dec 2023 11:58:34 GMT
- Title: PBScaler: A Bottleneck-aware Autoscaling Framework for
Microservice-based Applications
- Authors: Shuaiyu Xie, Jian Wang, Bing Li, Zekun Zhang, Duantengchuan Li,
Patrick C. K. H
- Abstract summary: We propose PBScaler, a bottleneck-aware autoscaling framework for microservice-based applications.
We show that PBScaler outperforms existing approaches while conserving resources efficiently.
- Score: 6.453782169615384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoscaling is critical for ensuring optimal performance and resource
utilization in cloud applications with dynamic workloads. However, traditional
autoscaling technologies are typically no longer applicable in
microservice-based applications due to the diverse workload patterns and
complex interactions between microservices. Specifically, the propagation of
performance anomalies through interactions leads to a high number of abnormal
microservices, making it difficult to identify the root performance bottlenecks
(PBs) and formulate appropriate scaling strategies. In addition, to balance
resource consumption and performance, the existing mainstream approaches based
on online optimization algorithms require multiple iterations, leading to
oscillation and elevating the likelihood of performance degradation. To tackle
these issues, we propose PBScaler, a bottleneck-aware autoscaling framework
designed to prevent performance degradation in a microservice-based
application. The key insight of PBScaler is to locate the PBs. Thus, we propose
TopoRank, a novel random walk algorithm based on the topological potential to
reduce unnecessary scaling. By integrating TopoRank with an offline
performance-aware optimization algorithm, PBScaler optimizes replica management
without disrupting the online application. Comprehensive experiments
demonstrate that PBScaler outperforms existing state-of-the-art approaches in
mitigating performance issues while conserving resources efficiently.
Related papers
- APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs [81.5049387116454]
We introduce APB, an efficient long-context inference framework.
APB uses multi-host approximate attention to enhance prefill speed.
APB achieves speeds of up to 9.2x, 4.2x, and 1.6x compared with FlashAttn, RingAttn, and StarAttn, respectively.
arXiv Detail & Related papers (2025-02-17T17:59:56Z) - STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges.
We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments.
Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z) - Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning [43.96374556275842]
It is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements.
This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations.
arXiv Detail & Related papers (2025-01-08T16:55:04Z) - Neural Horizon Model Predictive Control -- Increasing Computational Efficiency with Neural Networks [0.0]
We propose a proposed machine-learning supported approach to model predictive control.
We propose approximating part of the problem horizon, while maintaining safety guarantees.
The proposed MPC scheme can be applied to a wide range of applications, including those requiring a rapid control response.
arXiv Detail & Related papers (2024-08-19T08:13:37Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Quantum Algorithm Exploration using Application-Oriented Performance
Benchmarks [0.0]
The QED-C suite of Application-Oriented Benchmarks provides the ability to gauge performance characteristics of quantum computers.
We investigate challenges in broadening the relevance of this benchmarking methodology to applications of greater complexity.
arXiv Detail & Related papers (2024-02-14T06:55:50Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning [55.08287089554127]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data.
We show that it achieves no-regret under conditions analogous to GP-UCB.
Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.