PROMPT: Learning Dynamic Resource Allocation Policies for Network
Applications
- URL: http://arxiv.org/abs/2201.07916v2
- Date: Sat, 25 Mar 2023 01:07:57 GMT
- Title: PROMPT: Learning Dynamic Resource Allocation Policies for Network
Applications
- Authors: Drew Penney, Bin Li, Jaroslaw Sydir, Lizhong Chen, Charlie Tai, Stefan
Lee, Eoin Walsh, Thomas Long
- Abstract summary: We propose PROMPT, a novel resource allocation framework using proactive prediction to guide a reinforcement learning controller.
We show that PROMPT incurs 4.2x fewer violations, reduces severity of policy violations by 12.7x, improves best-effort workload performance, and improves overall power efficiency over prior work.
- Score: 16.812611987082082
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A growing number of service providers are exploring methods to improve server
utilization and reduce power consumption by co-scheduling high-priority
latency-critical workloads with best-effort workloads. This practice requires
strict resource allocation between workloads to reduce contention and maintain
Quality-of-Service (QoS) guarantees. Prior work demonstrated promising
opportunities to dynamically allocate resources based on workload demand, but
may fail to meet QoS objectives in more stringent operating environments due to
the presence of resource allocation cliffs, transient fluctuations in workload
performance, and rapidly changing resource demand. We therefore propose PROMPT,
a novel resource allocation framework using proactive QoS prediction to guide a
reinforcement learning controller. PROMPT enables more precise resource
optimization, more consistent handling of transient behaviors, and more robust
generalization when co-scheduling new best-effort workloads not encountered
during policy training. Evaluation shows that the proposed method incurs 4.2x
fewer QoS violations, reduces severity of QoS violations by 12.7x, improves
best-effort workload performance, and improves overall power efficiency over
prior work.
Related papers
- Meta-RTL: Reinforcement-Based Meta-Transfer Learning for Low-Resource Commonsense Reasoning [61.8360232713375]
We propose a reinforcement-based multi-source meta-transfer learning framework (Meta-RTL) for low-resource commonsense reasoning.
We present a reinforcement-based approach to dynamically estimating source task weights that measure the contribution of the corresponding tasks to the target task in the meta-transfer learning.
Experimental results demonstrate that Meta-RTL substantially outperforms strong baselines and previous task selection strategies.
arXiv Detail & Related papers (2024-09-27T18:22:22Z) - Reinforcement Learning-Based Adaptive Load Balancing for Dynamic Cloud Environments [0.0]
We propose a novel adaptive load balancing framework using Reinforcement Learning (RL) to address these challenges.
Our framework is designed to dynamically reallocate tasks to minimize latency and ensure balanced resource usage across servers.
Experimental results show that the proposed RL-based load balancer outperforms traditional algorithms in terms of response time, resource utilization, and adaptability to changing workloads.
arXiv Detail & Related papers (2024-09-07T19:40:48Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.
We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)
Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.
We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Multi-Level ML Based Burst-Aware Autoscaling for SLO Assurance and Cost
Efficiency [3.5624365288866007]
This paper introduces BAScaler, a Burst-Aware Autoscaling framework for containerized cloud services or applications under complex workloads.
BAScaler incorporates a novel prediction-based burst detection mechanism that distinguishes between predictable periodic workload spikes and actual bursts.
arXiv Detail & Related papers (2024-02-20T12:28:25Z) - RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud
Environments [7.825552412435501]
We propose a novel framework for fast, fully-online resource allocation policy learning in dynamic operating environments.
We show that our framework can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art.
arXiv Detail & Related papers (2023-04-10T18:04:39Z) - Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via
Conformal Prediction [72.59079526765487]
The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services.
The main challenge is posed by the uncertainty in the process of URLLC packet generation.
We introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor.
arXiv Detail & Related papers (2023-02-15T14:09:55Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - Optimal Resource Allocation for Serverless Queries [8.59568779761598]
Prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time.
We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
arXiv Detail & Related papers (2021-07-19T02:55:48Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization
Of Ephemeral Cloud Resources [2.205500582481277]
We propose a Reinforcement Learning strategy for optimizing the ephemeral resources' utilization in the cloud.
Our solution reduces significantly the SLA violation penalties on average by 2.7x and up to 3.4x.
It also improves considerably the CPs' potential savings by 27.6% on average and up to 43.6%.
arXiv Detail & Related papers (2020-09-23T15:19:28Z) - Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
Learning [61.29990368322931]
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors.
Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers.
arXiv Detail & Related papers (2020-08-27T16:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.