Related papers: Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization

Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization

URL: http://arxiv.org/abs/2507.18795v1
Date: Thu, 24 Jul 2025 20:32:47 GMT
Title: Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization
Authors: Fatima Al-Ani, Molly Wang, Jevon Charles, Aaron Ong, Joshua Forday, Vinayak Modi,
Abstract summary: This study focuses on the development of a simulation-driven reinforcement learning (RL) framework for optimizing routing decisions in complex queueing network systems.<n>We propose a robust RL approach leveraging Deep Deterministic Policy Gradient (DDPG) combined with Dyna-style planning (Dyna-DDPG)<n> Comprehensive experiments and rigorous evaluations demonstrate the framework's capability to rapidly learn effective routing policies.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study focuses on the development of a simulation-driven reinforcement learning (RL) framework for optimizing routing decisions in complex queueing network systems, with a particular emphasis on manufacturing and communication applications. Recognizing the limitations of traditional queueing methods, which often struggle with dynamic, uncertain environments, we propose a robust RL approach leveraging Deep Deterministic Policy Gradient (DDPG) combined with Dyna-style planning (Dyna-DDPG). The framework includes a flexible and configurable simulation environment capable of modeling diverse queueing scenarios, disruptions, and unpredictable conditions. Our enhanced Dyna-DDPG implementation incorporates separate predictive models for next-state transitions and rewards, significantly improving stability and sample efficiency. Comprehensive experiments and rigorous evaluations demonstrate the framework's capability to rapidly learn effective routing policies that maintain robust performance under disruptions and scale effectively to larger network sizes. Additionally, we highlight strong software engineering practices employed to ensure reproducibility and maintainability of the framework, enabling practical deployment in real-world scenarios.

Related papers

Robustness of Reinforcement Learning-Based Traffic Signal Control under Incidents: A Comparative Study [4.731967623788092]
Reinforcement learning-based traffic signal control (RL-TSC) has emerged as a promising approach for improving urban mobility.<n>In this study, we introduce T-REX, an open-source, SUMO-based simulation framework for training and evaluating RL-TSC methods under dynamic, incident scenarios.
arXiv Detail & Related papers (2025-06-16T08:15:29Z)
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z)
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery [3.549243565065057]
Imitation learning is a data-driven approach to learning policies from expert behavior.<n>It is prone to unreliable outcomes in out-of-sample (OOS) regions.<n>We propose a framework for learning policies modeled by contractive dynamical systems.
arXiv Detail & Related papers (2024-12-10T14:28:18Z)
Differentiable Discrete Event Simulation for Queuing Network Control [7.965453961211742]
Queueing network control poses distinct challenges, including highity, large state and action spaces, and lack of stability. We propose a scalable framework for policy optimization based on differentiable discrete event simulation. Our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments.
arXiv Detail & Related papers (2024-09-05T17:53:54Z)
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z)
A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch [0.0]
The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to fluctuations in dynamic prices, demand consumption, and renewable-based energy generation. By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' nature. We propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation.
arXiv Detail & Related papers (2023-07-26T17:12:04Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control [46.81433026280051]
We present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems. Our approach showcases high resilience and generalization capabilities by consistently adapting to unseen flight conditions.
arXiv Detail & Related papers (2022-10-23T00:45:05Z)
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters [0.0]
This paper introduces a new framework for benchmarking the performance of an RL agent in network environments simulated with ns-3. Within this framework, we demonstrate that an RL agent without domain-specific knowledge can learn how to efficiently adjust Radio Access Network (RAN) parameters to match offline optimization in static scenarios.
arXiv Detail & Related papers (2022-09-08T12:58:09Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.