Related papers: Solving the non-preemptive two queue polling model with generally distributed service and switch-over durations and Poisson arrivals as a Semi-Markov Decision Process

Solving the non-preemptive two queue polling model with generally distributed service and switch-over durations and Poisson arrivals as a Semi-Markov Decision Process

URL: http://arxiv.org/abs/2112.06578v1
Date: Mon, 13 Dec 2021 11:40:55 GMT
Title: Solving the non-preemptive two queue polling model with generally distributed service and switch-over durations and Poisson arrivals as a Semi-Markov Decision Process
Authors: Dylan Solms
Abstract summary: The polling system with switch-over durations is a useful model with several practical applications. It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists. This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The polling system with switch-over durations is a useful model with several practical applications. It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists. Furthermore, DEDS are quite complex. To date, the most sophisticated approach to modelling the polling system of interest has been a Continuous-time Markov Decision Process (CTMDP). This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power. Such power comes at the expense of truncation errors and expensive numerical integrals which naturally leads to the question of whether the SMDP policy provides a worthwhile advantage. To further add to this scenario, it is shown how sparsity can be exploited in the CTMDP to develop a computationally efficient model. The discounted performance of the SMDP and CTMDP policies are evaluated using a Semi-Markov Process simulator. The two policies are accompanied by a heuristic policy specifically developed for this polling system a well as an exhaustive service policy. Parametric and non-parametric hypothesis tests are used to test whether differences in performance are statistically significant.

Related papers

Robust Counterfactual Inference in Markov Decision Processes [1.5197843979051473]
Current approaches assume a specific causal model to make counterfactuals identifiable. We propose a novel non-parametric approach that computes tight bounds on counterfactual transition probabilities.
arXiv Detail & Related papers (2025-02-19T13:56:20Z)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z)
$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data. Our analysis allows for the sampling of entire episodes, as is common practice in most applications. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z)
Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems. Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS. We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z)
Robust Anytime Learning of Markov Decision Processes [8.799182983019557]
In data-driven applications, deriving precise probabilities from limited data introduces statistical errors. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions. We propose a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies.
arXiv Detail & Related papers (2022-05-31T14:29:55Z)
Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions. We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z)
Learning and Solving Regular Decision Processes [15.533842336139067]
Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. We build on automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS.
arXiv Detail & Related papers (2020-03-02T16:36:16Z)
DDKSP: A Data-Driven Stochastic Programming Framework for Car-Sharing Relocation Problem [17.440172040605354]
We investigate the car-sharing relocation problem (CSRP) under uncertain demands. In order to overcome the problem, an innovative framework called Data-Driven Kernel Programming (DDKSP) is proposed. The proposed framework outperforms the pure parametric approaches with 3.72%, 4.58% and 11% in terms of overall profits.
arXiv Detail & Related papers (2020-01-20T19:04:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.