Related papers: Differentiable Discrete Event Simulation for Queuing Network Control

Differentiable Discrete Event Simulation for Queuing Network Control

URL: http://arxiv.org/abs/2409.03740v1
Date: Thu, 5 Sep 2024 17:53:54 GMT
Title: Differentiable Discrete Event Simulation for Queuing Network Control
Authors: Ethan Che, Jing Dong, Hongseok Namkoong,
Abstract summary: Queueing network control poses distinct challenges, including highity, large state and action spaces, and lack of stability. We propose a scalable framework for policy optimization based on differentiable discrete event simulation. Our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments.
Score: 7.965453961211742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Queuing network control is essential for managing congestion in job-processing systems such as service systems, communication networks, and manufacturing processes. Despite growing interest in applying reinforcement learning (RL) techniques, queueing network control poses distinct challenges, including high stochasticity, large state and action spaces, and lack of stability. To tackle these challenges, we propose a scalable framework for policy optimization based on differentiable discrete event simulation. Our main insight is that by implementing a well-designed smoothing technique for discrete event dynamics, we can compute pathwise policy gradients for large-scale queueing networks using auto-differentiation software (e.g., Tensorflow, PyTorch) and GPU parallelization. Through extensive empirical experiments, we observe that our policy gradient estimators are several orders of magnitude more accurate than typical REINFORCE-based estimators. In addition, We propose a new policy architecture, which drastically improves stability while maintaining the flexibility of neural-network policies. In a wide variety of scheduling and admission control tasks, we demonstrate that training control policies with pathwise gradients leads to a 50-1000x improvement in sample efficiency over state-of-the-art RL methods. Unlike prior tailored approaches to queueing, our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments and those with non-exponential interarrival/service times.

Related papers

Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization [0.0]
This study focuses on the development of a simulation-driven reinforcement learning (RL) framework for optimizing routing decisions in complex queueing network systems.<n>We propose a robust RL approach leveraging Deep Deterministic Policy Gradient (DDPG) combined with Dyna-style planning (Dyna-DDPG)<n> Comprehensive experiments and rigorous evaluations demonstrate the framework's capability to rapidly learn effective routing policies.
arXiv Detail & Related papers (2025-07-24T20:32:47Z)
Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling [29.431945795881976]
We propose a novel offline reinforcement learning-based algorithm, named underlineScheduling. It learns efficient scheduling policies purely from pre-collected emphoffline data. We show that SOCD is resilient to various system dynamics, including partially observable and large-scale environments.
arXiv Detail & Related papers (2025-01-22T15:13:21Z)
Latent feedback control of distributed systems in multiple scenarios through deep learning-based reduced order models [3.5161229331588095]
Continuous monitoring and real-time control of high-dimensional distributed systems are crucial in applications to ensure a desired physical behavior. Traditional feedback control design that relies on full-order models fails to meet these requirements due to the delay in the control computation. We propose a real-time closed-loop control strategy enhanced by nonlinear non-intrusive Deep Learning-based Reduced Order Models (DL-ROMs)
arXiv Detail & Related papers (2024-12-13T08:04:21Z)
Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report [1.4201040196058878]
This work proposes Online Deep Reinforcement Learning-based Controls (ODRLC) as an alternative to traditional Deep Reinforcement Learning (DRL) methods. ODRLC uses online interactions to learn optimal control policies for queuing networks (SQNs) We introduce a method to design these intervention-assisted policies to ensure strong stability of the network.
arXiv Detail & Related papers (2024-04-05T14:02:04Z)
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z)
Efficient Domain Coverage for Vehicles with Second-Order Dynamics via Multi-Agent Reinforcement Learning [9.939081691797858]
We present a reinforcement learning (RL) approach for the multi-agent efficient domain coverage problem involving agents with second-order dynamics. Our proposed network architecture includes the incorporation of LSTM and self-attention, which allows the trained policy to adapt to a variable number of agents.
arXiv Detail & Related papers (2022-11-11T01:59:12Z)
Model-Free Learning of Optimal Deterministic Resource Allocations in Wireless Systems via Action-Space Exploration [4.721069729610892]
We propose a technically grounded and scalable deterministic-dual gradient policy method for efficiently learning optimal parameterized resource allocation policies. Our method not only efficiently exploits gradient availability of popular universal representations such as deep networks, but is also truly model-free, as it relies on consistent zeroth-order gradient approximations of associated random network services constructed via low-dimensional perturbations in action space.
arXiv Detail & Related papers (2021-08-23T18:26:16Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning [33.737301955006345]
Multicasting in wireless systems is a way to exploit the redundancy in user requests in a Content Centric Network. Power control and optimal scheduling can significantly improve the wireless multicast network's performance under fading. We show that power control policy can be learnt for reasonably large systems via this approach.
arXiv Detail & Related papers (2020-09-27T15:59:44Z)
Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.