Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling
- URL: http://arxiv.org/abs/2406.14141v1
- Date: Thu, 20 Jun 2024 09:34:24 GMT
- Title: Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling
- Authors: S. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson, Daniel Sadoc Menasché, Gugan Thoppe,
- Abstract summary: This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues.
We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP)
We extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.
- Score: 42.6574685545681
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.
Related papers
- Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids [14.389086937116582]
This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to solve optimal control problems in mixed-logical dynamical systems.
The proposed method significantly reduces the online computation time of the MPC approach and that it generates policies with small optimality gaps and high feasibility rates.
arXiv Detail & Related papers (2024-09-17T15:17:16Z) - Deep Reinforcement Learning for Uplink Scheduling in NOMA-URLLC Networks [7.182684187774442]
This article addresses the problem of Ultra Reliable Low Communications (URLLC) in wireless networks, a framework with particularly stringent constraints imposed by many Internet of Things (IoT) applications from diverse sectors.
We propose a novel Deep Reinforcement Learning (DRL) scheduling algorithm, to solve the Non-Orthogonal Multiple Access (NOMA) uplink URLLC scheduling problem involving strict deadlines.
arXiv Detail & Related papers (2023-08-28T12:18:02Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Learning-Assisted Algorithm Unrolling for Online Optimization with
Budget Constraints [27.84415856657607]
We propose a new machine learning (ML) assisted unrolling approach, called LAAU (Learning-Assisted Algorithm Unrolling)
For efficient training via backpropagation, we derive gradients of the decision pipeline over time.
We also provide the average cost bounds for two cases when training data is available offline and collected online, respectively.
arXiv Detail & Related papers (2022-12-03T20:56:29Z) - Off-line approximate dynamic programming for the vehicle routing problem
with stochastic customers and demands via decentralized decision-making [0.0]
This paper studies a variant of the vehicle routing problem (VRP) where both customer locations and demands are uncertain.
The objective is to maximize the served demands while fulfilling vehicle capacities and time restrictions.
We develop a Q-learning algorithm featuring state-of-the-art acceleration techniques such as Replay Memory and Double Q Network.
arXiv Detail & Related papers (2021-09-21T14:28:09Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Dynamic RAN Slicing for Service-Oriented Vehicular Networks via
Constrained Learning [40.5603189901241]
We investigate a radio access network (RAN) slicing problem for Internet of vehicles (IoV) services with different quality of service (QoS) requirements.
A dynamic RAN slicing framework is presented to dynamically allocate radio spectrum and computing resource.
We show that the RAWS effectively reduces the system cost while satisfying requirements with a high probability, as compared with benchmarks.
arXiv Detail & Related papers (2020-12-03T15:08:38Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.