Related papers: Queue Scheduling with Adversarial Bandit Learning

Queue Scheduling with Adversarial Bandit Learning

URL: http://arxiv.org/abs/2303.01745v1
Date: Fri, 3 Mar 2023 07:17:09 GMT
Title: Queue Scheduling with Adversarial Bandit Learning
Authors: Jiatai Huang, Leana Golubchik, Longbo Huang
Abstract summary: We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization. We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
Score: 20.606599586119835
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions. We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization, without knowledge of the instantaneous network state (the arrival and service rates) of each queue. We then present two novel algorithms \texttt{SoftMW} (SoftMaxWeight) and \texttt{SSMW} (Sliding-window SoftMaxWeight), both capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies whose time-variation satisfies a mild condition. We further generalize our results to the setting where arrivals and departures only have bounded moments instead of being deterministically bounded and propose \texttt{SoftMW+} and \texttt{SSMW+} that are capable of stabilizing the system. As a building block of our new algorithms, we also extend the classical \texttt{EXP3.S} (Auer et al., 2002) algorithm for multi-armed bandits to handle unboundedly large feedback signals, which can be of independent interest.

Related papers

Finite-Time Information-Theoretic Bounds in Queueing Control [54.11376591632282]
We derive new policies that achieve them-for the total queue length in scheduling problems over processing networks with both adversarial and arrivals.<n>These findings reveal a fundamental limitation on "drift-only" methods and point the way toward principled, non-asymptotic optimality in queueing control.
arXiv Detail & Related papers (2025-06-23T04:14:40Z)
Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time. Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback. Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z)
Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times. We introduce a new parallel method Freya to handle arbitrarily slow computations. We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z)
Quantifying the Cost of Learning in Queueing Systems [4.784875233446591]
Cost of Learning in Queueing (CLQ) is a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We propose a unified analysis framework for CLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.
arXiv Detail & Related papers (2023-08-15T14:50:12Z)
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB) This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z)
Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting. We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z)
Scheduling Servers with Stochastic Bilinear Rewards [7.519872646378837]
A system optimization problem arises in multi-class, multi-server queueing system scheduling. We propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward. Our algorithm sub-linear regret and sublinear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability.
arXiv Detail & Related papers (2021-12-13T00:37:20Z)
STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning [58.6792963686231]
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. It is not clear how to choose the WNs' minimum update directions, the first minibatch sizes, and the local update frequency. We show that there is a trade-off curve between local update frequencies and local mini sizes, on which the above complexities can be maintained.
arXiv Detail & Related papers (2021-06-19T06:13:45Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch. Second, for diverse system scales and structures, we use graph neural networks to embed system state information. Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.