Related papers: An SLO Driven and Cost-Aware Autoscaling Framework for Kubernetes

An SLO Driven and Cost-Aware Autoscaling Framework for Kubernetes

URL: http://arxiv.org/abs/2512.23415v1
Date: Mon, 29 Dec 2025 12:20:46 GMT
Title: An SLO Driven and Cost-Aware Autoscaling Framework for Kubernetes
Authors: Vinoth Punniyamoorthy, Bikesh Kumar, Sumit Saha, Lokesh Butra, Mayilsamy Palanigounder, Akash Kumar Agarwal, Kabilan Kannan,
Abstract summary: This paper investigates how autoscaling can be enhanced using AIOps principles to jointly satisfy Service Level Objective violations and cost constraints.<n>We propose a safe and explainable multi-signal autoscaling framework that integrates SLO-aware and cost-conscious control with lightweight demand forecasting.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kubernetes provides native autoscaling mechanisms, including the Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and node-level autoscalers, to enable elastic resource management for cloud-native applications. However, production environments frequently experience Service Level Objective violations and cost inefficiencies due to reactive scaling behavior, limited use of application-level signals, and opaque control logic. This paper investigates how Kubernetes autoscaling can be enhanced using AIOps principles to jointly satisfy SLO and cost constraints under diverse workload patterns without compromising safety or operational transparency. We present a gap-driven analysis of existing autoscaling approaches and propose a safe and explainable multi-signal autoscaling framework that integrates SLO-aware and cost-conscious control with lightweight demand forecasting. Experimental evaluation using representative microservice and event-driven workloads shows that the proposed approach reduces SLO violation duration by up to 31 percent, improves scaling response time by 24 percent, and lowers infrastructure cost by 18 percent compared to default and tuned Kubernetes autoscaling baselines, while maintaining stable and auditable control behavior. These results demonstrate that AIOps-driven, SLO-first autoscaling can significantly improve the reliability, efficiency, and operational trustworthiness of Kubernetes-based cloud platforms.

Related papers

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs [64.60188746073904]
Flow-based vision-language-action models excel in embodied control but suffer from intractable likelihoods during multi-step sampling.<n>We propose textbftextit$boldsymbol$-StepNFT (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step.
arXiv Detail & Related papers (2026-03-02T17:04:49Z)
ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z)
Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z)
Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling [55.026048429595384]
Test-time scaling improves the inference performance of Large Language Models (LLMs) but also incurs substantial computational costs.<n>We propose SeerSC, a dynamic self-consistency framework that simultaneously improves token efficiency and latency.
arXiv Detail & Related papers (2025-11-12T13:57:43Z)
Dynamic Speculative Agent Planning [57.630218933994534]
Large language-model-based agents face critical deployment challenges due to prohibitive latency and inference costs.<n>We introduce Dynamic Speculative Planning (DSP), an online reinforcement learning framework that provides lossless acceleration with substantially reduced costs.<n>Experiments on two standard agent benchmarks demonstrate that DSP achieves comparable efficiency to the fastest acceleration method while reducing total cost by 30% and unnecessary cost up to 60%.
arXiv Detail & Related papers (2025-09-02T03:34:36Z)
STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges.<n>We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments.<n>Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection.<n>To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities.<n>Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework [3.8320050452121692]
We introduce OWLed, the Outlier-Weighed Layerwise Pruning for Efficient Autonomous Driving Framework.<n>Our method assigns non-uniform sparsity ratios to different layers based on the distribution of outlier features.<n>To ensure the compressed model adapts well to autonomous driving tasks, we incorporate driving environment data into both the calibration and pruning processes.
arXiv Detail & Related papers (2024-11-12T10:55:30Z)
Multi-Level ML Based Burst-Aware Autoscaling for SLO Assurance and Cost Efficiency [3.5624365288866007]
This paper introduces BAScaler, a Burst-Aware Autoscaling framework for containerized cloud services or applications under complex workloads. BAScaler incorporates a novel prediction-based burst detection mechanism that distinguishes between predictable periodic workload spikes and actual bursts.
arXiv Detail & Related papers (2024-02-20T12:28:25Z)
OptScaler: A Collaborative Framework for Robust Autoscaling in the Cloud [10.97507717758812]
We propose OptScaler, a collaborative autoscaling framework that integrates proactive and reactive modules through an optimization module.<n> Numerical results have demonstrated the superiority of our workload prediction model and the collaborative framework.
arXiv Detail & Related papers (2023-10-26T04:38:48Z)
A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions [18.36339203254509]
F introduces a lightweight, function-based cloud execution model that finds its relevance in a range of applications like IoT-edge data processing and anomaly detection.
arXiv Detail & Related papers (2023-08-11T04:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.