Solving the non-preemptive two queue polling model with generally
distributed service and switch-over durations and Poisson arrivals as a
Semi-Markov Decision Process
- URL: http://arxiv.org/abs/2112.06578v1
- Date: Mon, 13 Dec 2021 11:40:55 GMT
- Title: Solving the non-preemptive two queue polling model with generally
distributed service and switch-over durations and Poisson arrivals as a
Semi-Markov Decision Process
- Authors: Dylan Solms
- Abstract summary: The polling system with switch-over durations is a useful model with several practical applications.
It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists.
This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The polling system with switch-over durations is a useful model with several
practical applications. It is classified as a Discrete Event Dynamic System
(DEDS) for which no one agreed upon modelling approach exists. Furthermore,
DEDS are quite complex. To date, the most sophisticated approach to modelling
the polling system of interest has been a Continuous-time Markov Decision
Process (CTMDP). This paper presents a Semi-Markov Decision Process (SMDP)
formulation of the polling system as to introduce additional modelling power.
Such power comes at the expense of truncation errors and expensive numerical
integrals which naturally leads to the question of whether the SMDP policy
provides a worthwhile advantage. To further add to this scenario, it is shown
how sparsity can be exploited in the CTMDP to develop a computationally
efficient model. The discounted performance of the SMDP and CTMDP policies are
evaluated using a Semi-Markov Process simulator. The two policies are
accompanied by a heuristic policy specifically developed for this polling
system a well as an exhaustive service policy. Parametric and non-parametric
hypothesis tests are used to test whether differences in performance are
statistically significant.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - POMDP inference and robust solution via deep reinforcement learning: An
application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL.
First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model.
The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Formal Controller Synthesis for Markov Jump Linear Systems with
Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems.
Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS.
We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z) - Robust Anytime Learning of Markov Decision Processes [8.799182983019557]
In data-driven applications, deriving precise probabilities from limited data introduces statistical errors.
Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions.
We propose a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies.
arXiv Detail & Related papers (2022-05-31T14:29:55Z) - Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian
Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states.
We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Learning and Solving Regular Decision Processes [15.533842336139067]
Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards.
We build on automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS.
arXiv Detail & Related papers (2020-03-02T16:36:16Z) - DDKSP: A Data-Driven Stochastic Programming Framework for Car-Sharing
Relocation Problem [17.440172040605354]
We investigate the car-sharing relocation problem (CSRP) under uncertain demands.
In order to overcome the problem, an innovative framework called Data-Driven Kernel Programming (DDKSP) is proposed.
The proposed framework outperforms the pure parametric approaches with 3.72%, 4.58% and 11% in terms of overall profits.
arXiv Detail & Related papers (2020-01-20T19:04:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.