Related papers: Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction

Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction

URL: http://arxiv.org/abs/2203.08019v1
Date: Mon, 14 Mar 2022 12:38:13 GMT
Title: Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction
Authors: Marc Rigter and Danial Dervovic and Parisa Hassanzadeh and Jason Long and Parisa Zehtabi and Daniele Magazzeni
Abstract summary: We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly.
Score: 16.99621896314678
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We formulate the problem as a discrete time Markov Decision Process (MDP) with a hybrid state space. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly. Moreover, we prove that as the time step is reduced, the discrete time solution approaches the optimal solution to the original continuous time problem. To improve the scalability of our approach to a greater number of task classes, we present an approximation based on state abstraction. We validate our approach on synthetic data, as well as a real financial fraud data set, which is the motivating application for this work.

Related papers

Better Rates for Random Task Orderings in Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. We develop novel last-iterate bounds in the realizable least squares setup, and apply them to derive new results for continual learning. We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task.
arXiv Detail & Related papers (2025-04-06T18:39:45Z)
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs [16.49229317664822]
We study the problem of infinite-horizon average-reward reinforcement learning with linear decision processes (MDPs) Our approach approximates the average-reward setting by a discounted discounting factor, then applies an optimistic value iteration.
arXiv Detail & Related papers (2024-05-23T20:58:33Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server. To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z)
Solution Path of Time-varying Markov Random Fields with Discrete Regularization [7.79230326339002]
We study the problem of inferring time-varying random fields (MRFs) with different discrete and temporal regularizations on the parameters. We show that the entire solution for all sparsity levels can be obtained in $mathcalO(Tp3)$, where $T$ is the number of time steps and $p$ is the number of unknown parameters at any given time.
arXiv Detail & Related papers (2023-07-25T18:19:50Z)
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes. We propose an algorithm based on Thompson sampling with dynamically-sized episodes. We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z)
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z)
Non-Parametric Stochastic Sequential Assignment With Random Arrival Times [3.871148938060281]
We consider a problem wherein jobs arrive at random times and assume random values. We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem. We prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large.
arXiv Detail & Related papers (2021-06-09T09:41:38Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Acting in Delayed Environments with Non-Stationary Markov Policies [57.52103323209643]
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps. We prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary. We devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation.
arXiv Detail & Related papers (2021-01-28T13:35:37Z)
Contrastive Losses and Solution Caching for Predict-and-Optimize [19.31153168397003]
We use a Noise Contrastive approach to motivate a family of surrogate loss functions. We address a major bottleneck of all predict-and-optimize approaches. We show that even a very slow growth rate is enough to match the quality of state-of-the-art methods.
arXiv Detail & Related papers (2020-11-10T19:09:12Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.