Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report
- URL: http://arxiv.org/abs/2404.04106v1
- Date: Fri, 5 Apr 2024 14:02:04 GMT
- Title: Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report
- Authors: Jerrod Wigmore, Brooke Shrader, Eytan Modiano,
- Abstract summary: This work proposes Online Deep Reinforcement Learning-based Controls (ODRLC) as an alternative to traditional Deep Reinforcement Learning (DRL) methods.
ODRLC uses online interactions to learn optimal control policies for queuing networks (SQNs)
We introduce a method to design these intervention-assisted policies to ensure strong stability of the network.
- Score: 1.4201040196058878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Reinforcement Learning (DRL) offers a powerful approach to training neural network control policies for stochastic queuing networks (SQN). However, traditional DRL methods rely on offline simulations or static datasets, limiting their real-world application in SQN control. This work proposes Online Deep Reinforcement Learning-based Controls (ODRLC) as an alternative, where an intelligent agent interacts directly with a real environment and learns an optimal control policy from these online interactions. SQNs present a challenge for ODRLC due to the unbounded nature of the queues within the network resulting in an unbounded state-space. An unbounded state-space is particularly challenging for neural network policies as neural networks are notoriously poor at extrapolating to unseen states. To address this challenge, we propose an intervention-assisted framework that leverages strategic interventions from known stable policies to ensure the queue sizes remain bounded. This framework combines the learning power of neural networks with the guaranteed stability of classical control policies for SQNs. We introduce a method to design these intervention-assisted policies to ensure strong stability of the network. Furthermore, we extend foundational DRL theorems for intervention-assisted policies and develop two practical algorithms specifically for ODRLC of SQNs. Finally, we demonstrate through experiments that our proposed algorithms outperform both classical control approaches and prior ODRLC algorithms.
Related papers
- Differentiable Discrete Event Simulation for Queuing Network Control [7.965453961211742]
Queueing network control poses distinct challenges, including highity, large state and action spaces, and lack of stability.
We propose a scalable framework for policy optimization based on differentiable discrete event simulation.
Our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments.
arXiv Detail & Related papers (2024-09-05T17:53:54Z) - Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning [69.00997996453842]
We propose a deep Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for virtual network embedding.
We show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue.
arXiv Detail & Related papers (2024-06-25T07:42:30Z) - Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies [7.9898826915621965]
We use a verification procedure that trains another neural network, which acts as a certificate proving that the policy satisfies the task.
For reach-avoid tasks, it suffices to show that this certificate network is a reach-avoid supermartingale (RASM)
arXiv Detail & Related papers (2024-06-02T18:19:19Z) - Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system.
We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Symbolic Distillation for Learned TCP Congestion Control [70.27367981153299]
TCP congestion control has achieved tremendous success with deep reinforcement learning (RL) approaches.
Black-box policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath.
This paper proposes a novel two-stage solution to achieve the best of both worlds: first, to train a deep RL agent, then distill its NN policy into white-box, light-weight rules.
arXiv Detail & Related papers (2022-10-24T00:58:16Z) - Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control [37.10638636086814]
We consider a joint uplink and downlink scheduling problem of a fully distributed wireless control system (WNCS) with a limited number of frequency channels.
We develop a deep reinforcement learning (DRL) based framework for solving it.
To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods.
arXiv Detail & Related papers (2021-09-26T11:27:12Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control.
We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z) - RL-QN: A Reinforcement Learning Framework for Optimal Control of
Queueing Systems [8.611328447624677]
We consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks.
Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem.
We propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space.
arXiv Detail & Related papers (2020-11-14T22:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.