Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices
- URL: http://arxiv.org/abs/2212.12180v5
- Date: Sun, 14 Apr 2024 06:07:55 GMT
- Title: Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices
- Authors: Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, Francis Y. Yan,
- Abstract summary: Autothrottle is a bi-level resource management framework for service-level objectives.
It traverses application feedback from service resource control and bridges them through the notion of performance targets.
Results show superior savings of up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.
- Score: 30.075132870154153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving resource efficiency while preserving end-user experience is non-trivial for cloud application operators. As cloud applications progressively adopt microservices, resource managers are faced with two distinct levels of system behavior: end-to-end application latency and per-service resource usage. Translating between the two levels, however, is challenging because user requests traverse heterogeneous services that collectively (but unevenly) contribute to the end-to-end latency. We present Autothrottle, a bi-level resource management framework for microservices with latency SLOs (service-level objectives). It architecturally decouples application SLO feedback from service resource control, and bridges them through the notion of performance targets. Specifically, an application-wide learning-based controller is employed to periodically set performance targets -- expressed as CPU throttle ratios -- for per-service heuristic controllers to attain. We evaluate Autothrottle on three microservice applications, with workload traces from production scenarios. Results show superior CPU savings, up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.
Related papers
- AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering [52.67783579040657]
AceGRPO is a machine learning system that prioritizes tasks at the agent's learning frontier to maximize learning efficiency.<n>Our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines.
arXiv Detail & Related papers (2026-02-08T10:55:03Z) - Asynchronous MultiAgent Reinforcement Learning for 5G Routing under Side Constraints [1.0732935873226022]
We propose an asynchronous multi-agent reinforcement learning framework in which independent PPO agents plan routes in parallel and commit resource deltas to a shared global resource environment.<n>We evaluate the method on an O-RAN like network simulation using nearly real-time traffic data from the city of Montreal.<n>AMARL achieves a similar Grade of Service (GoS) and end-to-end latency, with reduced training wall-clock time and improved robustness to demand shifts.
arXiv Detail & Related papers (2026-01-18T18:38:37Z) - NanoCockpit: Performance-optimized Application Framework for AI-based Autonomous Nanorobotics [50.594459728605734]
Small form factor, i.e., a few 10s grams, severely limits onboard computational resources to sub-SI100milliwatt microcontroller units (MCUs)<n>Our framework achieves ideal end-to-end latency, i.e. zero overhead due to serialized tasks, delivering quantifiable improvements in closed-loop control performance.
arXiv Detail & Related papers (2026-01-12T12:29:38Z) - RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure [49.88201789074532]
Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning.<n>We present RollArc, a distributed system designed to maximize throughput for multi-task agentic RL on disaggregated infrastructure.
arXiv Detail & Related papers (2025-12-27T11:14:23Z) - Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents [12.884297990127985]
Astraea is a service engine designed to shift the optimization from local segments to the global request lifecycle.<n>It employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions.<n>Astraea reduces average JCT by up to 25.5% compared to baseline methods.
arXiv Detail & Related papers (2025-12-16T06:55:10Z) - SCUBA: Salesforce Computer Use Benchmark [63.66753028386581]
SCUBA is a benchmark designed to evaluate computer-use agents on customer relationship management ( CRM) within the Salesforce platform.<n> SCUBA contains 300 task instances derived from real user interviews, spanning three primary personas, platform administrators, sales representatives, and service agents.<n>We benchmark a diverse set of agents under both zero-shot and demonstration-augmented settings.
arXiv Detail & Related papers (2025-09-30T16:48:49Z) - How to Train Your LLM Web Agent: A Statistical Diagnosis [102.04125085041473]
We present the first statistically grounded study on compute allocation for LLM web-agent post-training.<n>Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT) and on-policy reinforcement learning.<n>Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++.
arXiv Detail & Related papers (2025-07-05T17:12:33Z) - ConsumerBench: Benchmarking Generative AI Applications on End-User Devices [6.6246058403368595]
The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience.<n>This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices.
arXiv Detail & Related papers (2025-06-21T01:32:22Z) - Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests.
We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments.
We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z) - STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges.
We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments.
Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z) - Distilling Multi-modal Large Language Models for Autonomous Driving [64.63127269187814]
Recent end-to-end autonomous driving systems leverage large language models (LLMs) as planners to improve generalizability to rare events.
We propose DiMA, an end-to-end autonomous driving system that maintains the efficiency of an LLM-free (or vision-based) planner while leveraging the world knowledge of an LLM.
Training with DiMA results in a 37% reduction in the L2 trajectory error and an 80% reduction in the collision rate of the vision-based planner, as well as a 44% trajectory error reduction in longtail scenarios.
arXiv Detail & Related papers (2025-01-16T18:59:53Z) - FaaSRCA: Full Lifecycle Root Cause Analysis for Serverless Applications [9.14008416378655]
FRCA is a full lifecycle root cause analysis method for serverless applications.
It integrates multi-modal observability data generated from platform and application side by using Global Call Graph.
Based on the scores, we determine the root cause at the granularity of the lifecycle stage of serverless functions.
arXiv Detail & Related papers (2024-12-03T08:06:29Z) - SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow.
SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns.
We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z) - ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-Serving [61.35068981176018]
ConServe is a large language model (LLM) serving system that achieves high throughput and strong online latency guarantees.<n>We show that ConServe delivers an average of 2.2$times$ higher throughput and reduces online serving tail latency by 2.9$times$ on average compared to state-of-the-art systems.
arXiv Detail & Related papers (2024-10-02T04:12:13Z) - A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles [49.86094523878003]
We propose a decentralized incentive mechanism for mobile AIGC service allocation.
We employ multi-agent deep reinforcement learning to find the balance between the supply of AIGC services on RSUs and user demand for services within the IoV context.
arXiv Detail & Related papers (2024-03-29T12:46:07Z) - RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks.
This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - A Framework for dynamically meeting performance objectives on a service
mesh [0.0]
We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh.
We apply reinforcement learning techniques to train an agent that periodically performs control actions to real resources.
arXiv Detail & Related papers (2023-06-25T09:08:41Z) - Dynamically meeting performance objectives for multiple services on a
service mesh [0.0]
We present a framework that lets a service provider achieve end-to-end management objectives under varying load.
We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, and service differentiation.
We compute the control policies not on the testbed, but in a simulator, which speeds up the learning process by orders of magnitude.
arXiv Detail & Related papers (2022-10-08T11:54:25Z) - Reinforcement Learning-based Dynamic Service Placement in Vehicular
Networks [4.010371060637208]
complexity of traffic mobility patterns and dynamics in the requests for different types of services has made service placement a challenging task.
A typical static placement solution is not effective as it does not consider the traffic mobility and service dynamics.
We propose a reinforcement learning-based dynamic (RL-Dynamic) service placement framework to find the optimal placement of services at the edge servers.
arXiv Detail & Related papers (2021-05-31T15:01:35Z) - Intelligent colocation of HPC workloads [0.0]
Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized.
It is hard for developers and runtime systems to ensure that all critical resources are fully exploited by a single application, so an attractive technique is to colocate multiple applications on the same server.
We show that server efficiency can be improved by first modeling the expected performance degradation of colocated applications based on measured hardware performance counters.
arXiv Detail & Related papers (2021-03-16T12:35:35Z) - Deep Learning-based Resource Allocation For Device-to-Device
Communication [66.74874646973593]
We propose a framework for the optimization of the resource allocation in multi-channel cellular systems with device-to-device (D2D) communication.
A deep learning (DL) framework is proposed, where the optimal resource allocation strategy for arbitrary channel conditions is approximated by deep neural network (DNN) models.
Our simulation results confirm that near-optimal performance can be attained with low time, which underlines the real-time capability of the proposed scheme.
arXiv Detail & Related papers (2020-11-25T14:19:23Z) - Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
Learning [61.29990368322931]
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors.
Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers.
arXiv Detail & Related papers (2020-08-27T16:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.