Processing Network Controls via Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2205.02119v1
- Date: Sun, 1 May 2022 04:34:21 GMT
- Title: Processing Network Controls via Deep Reinforcement Learning
- Authors: Mark Gluzman
- Abstract summary: dissertation is concerned with theoretical justification and practical application of the advanced policy gradient algorithms.
Policy improvement bounds play a crucial role in the theoretical justification of the APG algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Novel advanced policy gradient (APG) algorithms, such as proximal policy
optimization (PPO), trust region policy optimization, and their variations,
have become the dominant reinforcement learning (RL) algorithms because of
their ease of implementation and good practical performance. This dissertation
is concerned with theoretical justification and practical application of the
APG algorithms for solving processing network control optimization problems.
Processing network control problems are typically formulated as Markov decision
process (MDP) or semi-Markov decision process (SMDP) problems that have several
unconventional for RL features: infinite state spaces, unbounded costs,
long-run average cost objectives. Policy improvement bounds play a crucial role
in the theoretical justification of the APG algorithms. In this thesis we
refine existing bounds for MDPs with finite state spaces and prove novel policy
improvement bounds for classes of MDPs and SMDPs used to model processing
network operations. We consider two examples of processing network control
problems and customize the PPO algorithm to solve them. First, we consider
parallel-server and multiclass queueing networks controls. Second, we consider
the drivers repositioning problem in a ride-hailing service system. For both
examples the PPO algorithm with auxiliary modifications consistently generates
control policies that outperform state-of-art heuristics.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints [36.16736392624796]
We introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion.
We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm.
We show its superior empirical performance when compared to other state-of-the-art algorithms adapted for the ACMDPs.
arXiv Detail & Related papers (2023-02-02T00:23:36Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z) - Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks.
The algorithm consistently generates control policies that outperform state-of-arts in literature.
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.