Related papers: Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

URL: http://arxiv.org/abs/2212.12180v5
Date: Sun, 14 Apr 2024 06:07:55 GMT
Title: Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices
Authors: Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, Francis Y. Yan,
Abstract summary: Autothrottle is a bi-level resource management framework for service-level objectives. It traverses application feedback from service resource control and bridges them through the notion of performance targets. Results show superior savings of up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.
Score: 30.075132870154153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Achieving resource efficiency while preserving end-user experience is non-trivial for cloud application operators. As cloud applications progressively adopt microservices, resource managers are faced with two distinct levels of system behavior: end-to-end application latency and per-service resource usage. Translating between the two levels, however, is challenging because user requests traverse heterogeneous services that collectively (but unevenly) contribute to the end-to-end latency. We present Autothrottle, a bi-level resource management framework for microservices with latency SLOs (service-level objectives). It architecturally decouples application SLO feedback from service resource control, and bridges them through the notion of performance targets. Specifically, an application-wide learning-based controller is employed to periodically set performance targets -- expressed as CPU throttle ratios -- for per-service heuristic controllers to attain. We evaluate Autothrottle on three microservice applications, with workload traces from production scenarios. Results show superior CPU savings, up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.

Related papers

Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments. We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges. We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments. Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z)
Distilling Multi-modal Large Language Models for Autonomous Driving [64.63127269187814]
Recent end-to-end autonomous driving systems leverage large language models (LLMs) as planners to improve generalizability to rare events. We propose DiMA, an end-to-end autonomous driving system that maintains the efficiency of an LLM-free (or vision-based) planner while leveraging the world knowledge of an LLM. Training with DiMA results in a 37% reduction in the L2 trajectory error and an 80% reduction in the collision rate of the vision-based planner, as well as a 44% trajectory error reduction in longtail scenarios.
arXiv Detail & Related papers (2025-01-16T18:59:53Z)
FaaSRCA: Full Lifecycle Root Cause Analysis for Serverless Applications [9.14008416378655]
FRCA is a full lifecycle root cause analysis method for serverless applications. It integrates multi-modal observability data generated from platform and application side by using Global Call Graph. Based on the scores, we determine the root cause at the granularity of the lifecycle stage of serverless functions.
arXiv Detail & Related papers (2024-12-03T08:06:29Z)
SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow. SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns. We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z)
A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles [49.86094523878003]
We propose a decentralized incentive mechanism for mobile AIGC service allocation. We employ multi-agent deep reinforcement learning to find the balance between the supply of AIGC services on RSUs and user demand for services within the IoV context.
arXiv Detail & Related papers (2024-03-29T12:46:07Z)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z)
DeepScaler: Holistic Autoscaling for Microservices Based on Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach. It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency. Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z)
A Framework for dynamically meeting performance objectives on a service mesh [0.0]
We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh. We apply reinforcement learning techniques to train an agent that periodically performs control actions to real resources.
arXiv Detail & Related papers (2023-06-25T09:08:41Z)
Dynamically meeting performance objectives for multiple services on a service mesh [0.0]
We present a framework that lets a service provider achieve end-to-end management objectives under varying load. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, and service differentiation. We compute the control policies not on the testbed, but in a simulator, which speeds up the learning process by orders of magnitude.
arXiv Detail & Related papers (2022-10-08T11:54:25Z)
Reinforcement Learning-based Dynamic Service Placement in Vehicular Networks [4.010371060637208]
complexity of traffic mobility patterns and dynamics in the requests for different types of services has made service placement a challenging task. A typical static placement solution is not effective as it does not consider the traffic mobility and service dynamics. We propose a reinforcement learning-based dynamic (RL-Dynamic) service placement framework to find the optimal placement of services at the edge servers.
arXiv Detail & Related papers (2021-05-31T15:01:35Z)
Intelligent colocation of HPC workloads [0.0]
Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized. It is hard for developers and runtime systems to ensure that all critical resources are fully exploited by a single application, so an attractive technique is to colocate multiple applications on the same server. We show that server efficiency can be improved by first modeling the expected performance degradation of colocated applications based on measured hardware performance counters.
arXiv Detail & Related papers (2021-03-16T12:35:35Z)
Deep Learning-based Resource Allocation For Device-to-Device Communication [66.74874646973593]
We propose a framework for the optimization of the resource allocation in multi-channel cellular systems with device-to-device (D2D) communication. A deep learning (DL) framework is proposed, where the optimal resource allocation strategy for arbitrary channel conditions is approximated by deep neural network (DNN) models. Our simulation results confirm that near-optimal performance can be attained with low time, which underlines the real-time capability of the proposed scheme.
arXiv Detail & Related papers (2020-11-25T14:19:23Z)
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning [61.29990368322931]
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors. Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers.
arXiv Detail & Related papers (2020-08-27T16:56:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.