Queue Scheduling with Adversarial Bandit Learning
- URL: http://arxiv.org/abs/2303.01745v1
- Date: Fri, 3 Mar 2023 07:17:09 GMT
- Title: Queue Scheduling with Adversarial Bandit Learning
- Authors: Jiatai Huang, Leana Golubchik, Longbo Huang
- Abstract summary: We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates.
Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization.
We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
- Score: 20.606599586119835
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we study scheduling of a queueing system with zero knowledge
of instantaneous network conditions. We consider a one-hop single-server
queueing system consisting of $K$ queues, each with time-varying and
non-stationary arrival and service rates. Our scheduling approach builds on an
innovative combination of adversarial bandit learning and Lyapunov drift
minimization, without knowledge of the instantaneous network state (the arrival
and service rates) of each queue. We then present two novel algorithms
\texttt{SoftMW} (SoftMaxWeight) and \texttt{SSMW} (Sliding-window
SoftMaxWeight), both capable of stabilizing systems that can be stablized by
some (possibly unknown) sequence of randomized policies whose time-variation
satisfies a mild condition. We further generalize our results to the setting
where arrivals and departures only have bounded moments instead of being
deterministically bounded and propose \texttt{SoftMW+} and \texttt{SSMW+} that
are capable of stabilizing the system. As a building block of our new
algorithms, we also extend the classical \texttt{EXP3.S} (Auer et al., 2002)
algorithm for multi-armed bandits to handle unboundedly large feedback signals,
which can be of independent interest.
Related papers
- Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time.
Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback.
Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z) - Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - Quantifying the Cost of Learning in Queueing Systems [4.784875233446591]
Cost of Learning in Queueing (CLQ) is a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty.
We propose a unified analysis framework for CLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.
arXiv Detail & Related papers (2023-08-15T14:50:12Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Scheduling Servers with Stochastic Bilinear Rewards [7.519872646378837]
A system optimization problem arises in multi-class, multi-server queueing system scheduling.
We propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward.
Our algorithm sub-linear regret and sublinear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability.
arXiv Detail & Related papers (2021-12-13T00:37:20Z) - STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal
Sample and Communication Complexities for Federated Learning [58.6792963686231]
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data.
It is not clear how to choose the WNs' minimum update directions, the first minibatch sizes, and the local update frequency.
We show that there is a trade-off curve between local update frequencies and local mini sizes, on which the above complexities can be maintained.
arXiv Detail & Related papers (2021-06-19T06:13:45Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.