Related papers: OptScaler: A Hybrid Proactive-Reactive Framework for Robust Autoscaling in the Cloud

OptScaler: A Hybrid Proactive-Reactive Framework for Robust Autoscaling in the Cloud

URL: http://arxiv.org/abs/2311.12864v1
Date: Thu, 26 Oct 2023 04:38:48 GMT
Title: OptScaler: A Hybrid Proactive-Reactive Framework for Robust Autoscaling in the Cloud
Authors: Ding Zou, Wei Lu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Xiaojin Wang, Kangyu Liu, Haiqing Wang, Kefan Wang, Renen Sun
Abstract summary: Autoscaling is a vital mechanism in cloud computing that supports the autonomous adjustment of computing resources under dynamic workloads. Existing proactive autoscaling methods anticipate the future workload and scale the resources in advance, whereas reactive methods rely on real-time system feedback. This paper presents OptScaler, a hybrid autoscaling framework that integrates the power of both proactive and reactive methods for regulating CPU utilization.
Score: 11.340252931723063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoscaling is a vital mechanism in cloud computing that supports the autonomous adjustment of computing resources under dynamic workloads. A primary goal of autoscaling is to stabilize resource utilization at a desirable level, thus reconciling the need for resource-saving with the satisfaction of Service Level Objectives (SLOs). Existing proactive autoscaling methods anticipate the future workload and scale the resources in advance, whereas the reliability may suffer from prediction deviations arising from the frequent fluctuations and noise of cloud workloads; reactive methods rely on real-time system feedback, while the hysteretic nature of reactive methods could cause violations of the rigorous SLOs. To this end, this paper presents OptScaler, a hybrid autoscaling framework that integrates the power of both proactive and reactive methods for regulating CPU utilization. Specifically, the proactive module of OptScaler consists of a sophisticated workload prediction model and an optimization model, where the former provides reliable inputs to the latter for making optimal scaling decisions. The reactive module provides a self-tuning estimator of CPU utilization to the optimization model. We embed Model Predictive Control (MPC) mechanism and robust optimization techniques into the optimization model to further enhance its reliability. Numerical results have demonstrated the superiority of both the workload prediction model and the hybrid framework of OptScaler in the scenario of online services compared to prevalent reactive, proactive, or hybrid autoscalers. OptScaler has been successfully deployed at Alipay, supporting the autoscaling of applets in the world-leading payment platform.

Related papers

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding [71.654781631463]
ReAgent-V is a novel agentic video understanding framework.<n>It integrates efficient frame selection with real-time reward generation during inference.<n>Extensive experiments on 12 datasets demonstrate significant gains in generalization and reasoning.
arXiv Detail & Related papers (2025-06-02T04:23:21Z)
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [50.19188692497892]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z)
LAPSO: A Unified Optimization View for Learning-Augmented Power System Operations [3.754570687412345]
This paper proposes a holistic framework of Learning-Augmented Power System Operations (LAPSO)<n>LAPSO is centered on the operation stage and aims to break the boundary between temporally siloed power system tasks.<n>A dedicated Python package-lapso is introduced to automatically augment existing power system optimization models with learnable components.
arXiv Detail & Related papers (2025-05-08T13:00:24Z)
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization [74.92515821144484]
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects. Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive. This paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models.
arXiv Detail & Related papers (2025-04-25T04:07:21Z)
Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated Scaling [1.3689475854650441]
This study proposes a comprehensive scalability optimization framework for cloud AI inference services. The proposed model is a hybrid approach that combines reinforcement learning for adaptive load distribution and deep neural networks for accurate demand forecasting. Experimental results demonstrate that the proposed model enhances load balancing efficiency by 35 and reduces response delay by 28.
arXiv Detail & Related papers (2025-04-16T04:00:04Z)
Modular Framework for Uncertainty Prediction in Autonomous Vehicle Motion Forecasting within Complex Traffic Scenarios [6.6603214802161546]
Our framework employs a flexible, end-to-end differentiable probabilistic encoder-decoder architecture. Key contributions include a probabilistic heatmap predictor that generates context-aware occupancy grids for dynamic forecasting. We benchmark it against an end-to-end baseline, demonstrating faster convergence, improved stability, and flexibility.
arXiv Detail & Related papers (2025-01-27T20:21:18Z)
Microservices-Based Framework for Predictive Analytics and Real-time Performance Enhancement in Travel Reservation Systems [1.03590082373586]
The paper presents a framework of architecture dedicated to enhancing the performance of real-time travel reservation systems. Our framework includes real-time predictive analytics, through machine learning models, that optimize forecasting customer demand, dynamic pricing, as well as system performance. Future work will be an investigation of advanced AI models and edge processing to further improve the performance and robustness of the systems employed.
arXiv Detail & Related papers (2024-12-20T07:19:42Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts. This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system. We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions [18.36339203254509]
F introduces a lightweight, function-based cloud execution model that finds its relevance in a range of applications like IoT-edge data processing and anomaly detection.
arXiv Detail & Related papers (2023-08-11T04:41:19Z)
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries. We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Fully Decentralized Model-based Policy Optimization for Networked Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors. In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z)
A Meta Reinforcement Learning Approach for Predictive Autoscaling in the Cloud [10.970391043991363]
We propose an end-to-end predictive meta model-based RL algorithm, aiming to optimally allocate resource to maintain a stable CPU utilization level. Our algorithm not only ensures the predictability and accuracy of the scaling strategy, but also enables the scaling decisions to adapt to the changing workloads with high sample efficiency.
arXiv Detail & Related papers (2022-05-31T13:54:04Z)
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z)
A Reinforcement Learning-based Economic Model Predictive Control Framework for Autonomous Operation of Chemical Reactors [0.5735035463793008]
This work presents a novel framework for integrating EMPC and RL for online model parameter estimation of a class of nonlinear systems. The major advantage of this framework is its simplicity; state-of-the-art RL algorithms and EMPC schemes can be employed with minimal modifications.
arXiv Detail & Related papers (2021-05-06T13:34:30Z)
A Predictive Autoscaler for Elastic Batch Jobs [8.354712625979776]
Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service. We propose a predictive autoscaler to provide an elastic interface for the customers and overprovision instances.
arXiv Detail & Related papers (2020-10-10T17:35:55Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.