Optimal Admission Control for Multiclass Queues with Time-Varying
Arrival Rates via State Abstraction
- URL: http://arxiv.org/abs/2203.08019v1
- Date: Mon, 14 Mar 2022 12:38:13 GMT
- Title: Optimal Admission Control for Multiclass Queues with Time-Varying
Arrival Rates via State Abstraction
- Authors: Marc Rigter and Danial Dervovic and Parisa Hassanzadeh and Jason Long
and Parisa Zehtabi and Daniele Magazzeni
- Abstract summary: We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks.
The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon.
We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly.
- Score: 16.99621896314678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a novel queuing problem where the decision-maker must choose to
accept or reject randomly arriving tasks into a no buffer queue which are
processed by $N$ identical servers. Each task has a price, which is a positive
real number, and a class. Each class of task has a different price distribution
and service rate, and arrives according to an inhomogenous Poisson process. The
objective is to decide which tasks to accept so that the total price of tasks
processed is maximised over a finite horizon. We formulate the problem as a
discrete time Markov Decision Process (MDP) with a hybrid state space. We show
that the optimal value function has a specific structure, which enables us to
solve the hybrid MDP exactly. Moreover, we prove that as the time step is
reduced, the discrete time solution approaches the optimal solution to the
original continuous time problem. To improve the scalability of our approach to
a greater number of task classes, we present an approximation based on state
abstraction. We validate our approach on synthetic data, as well as a real
financial fraud data set, which is the motivating application for this work.
Related papers
- Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Solution Path of Time-varying Markov Random Fields with Discrete
Regularization [7.79230326339002]
We study the problem of inferring time-varying random fields (MRFs) with different discrete and temporal regularizations on the parameters.
We show that the entire solution for all sparsity levels can be obtained in $mathcalO(Tp3)$, where $T$ is the number of time steps and $p$ is the number of unknown parameters at any given time.
arXiv Detail & Related papers (2023-07-25T18:19:50Z) - Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes.
We propose an algorithm based on Thompson sampling with dynamically-sized episodes.
We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z) - Markov Decision Process modeled with Bandits for Sequential Decision
Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence.
We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix.
We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z) - Non-Parametric Stochastic Sequential Assignment With Random Arrival
Times [3.871148938060281]
We consider a problem wherein jobs arrive at random times and assume random values.
We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem.
We prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large.
arXiv Detail & Related papers (2021-06-09T09:41:38Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Acting in Delayed Environments with Non-Stationary Markov Policies [57.52103323209643]
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.
We prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary.
We devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation.
arXiv Detail & Related papers (2021-01-28T13:35:37Z) - Contrastive Losses and Solution Caching for Predict-and-Optimize [19.31153168397003]
We use a Noise Contrastive approach to motivate a family of surrogate loss functions.
We address a major bottleneck of all predict-and-optimize approaches.
We show that even a very slow growth rate is enough to match the quality of state-of-the-art methods.
arXiv Detail & Related papers (2020-11-10T19:09:12Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.